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Abstract 

Enhanced reality visualization is the process of enhancing an image by adding 
to it information which is not present in the original image. A wide variety of 
information can be added to an image ranging from hidden lines or surfaces to 
textual or iconic data about a particular part of the image. Enhanced reality 
visualization is particularly well suited to neurosurgery. By rendering brain 
structures which are not visible, atthecorrect location in an image of a patient's 
head, the surgeon is essentially provided with X-ray vision. He can visual izethe 
spatial relationship between brain structures before he performs a craniotomy 
and during the surgery he can see what's under the next layer before he cuts 
through. Given a video image of the patient and a three dimensional model of 
the pati ent's brain, the probl em enhanced real ity vi sual ization faces is to render 
the model from the correct viewpoint and overlay it on the original image. The 
relationshi p between the coordi nateframes of the pati ent, the pati ent's i nternal 
anatomy scans and the i mage pi ane of the camera observi ng the pati ent must be 
established. This problem is closely related to the camera calibration problem. 
This report presents a new approach to finding this relationship and develops a 
system for performing enhanced reality visualization in a surgical environment. 

I mmediately prior tosurgery a few circular fiducials are placed near thesurgi cal 
site. An initial registration of video and internal data is performed using a 
laser scanner. Following this, our method is fully automatic, runs in nearly 
real-time, is accurate to within a pixel, allows both patient and camera motion, 
automatically corrects for changes to the internal camera parameters (focal 
length, focus, aperture, etc.) and requires only a single image. 
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Chapter 1 
Introduction 


1.1 Computers in Medicine 

The use of computers in medicine has increased dramatically over the last 
decade [Bemmel eta/., 1985, Reggia and Tuhrim, 1985, Miller, 1990, Chang, 
1993]. As a result, nearly all aspects of medical care have seen improvement 
from the introduction of computer-based tools. These tools range from auto¬ 
mated patient record keeping to three dimensional visualization of internal 
anatomy and from computer assisted diagnosis to automatic drug interaction 
and allergy screening. The use of computers to assist physicians in the plan¬ 
ning and execution of surgical procedures is also growing [Lemoineet a/., 1991, 
Smith eta/., 1991, Pieper eta/., 1992, Verbeeck eta/., 1993]. One area which 
could benefit greatly from more sophisticated computer-based tools is neuro¬ 
surgery. The need to minimize collateral damage while removing diseased 
tissue requires extreme precision. I n addition, damage to certain critical brain 
regions, such as the motor strip, must be avoided if at all possible. Planning 
a surgical approach meeting all of the criteria is difficult and tedious. Identi¬ 
fication of specific brain structures and modification of the planned approach 
are often difficult during the surgical procedure, placing additional emphasis 
on planning. Traditionally, precision neurosurgery requires the use of a stereo¬ 
tactic frame which is rigidly attached to the patient's skull. Figure 1-1 shows 
some typical stereotactic frames. The frame is attached prior to and is visible 
in presurgical imaging such as magnetic resonance (MR) or computed tomog¬ 
raphy (CT) imaging. This allows surgical plans based on presurgical internal 
anatomy scans to be transferred to the patient using the stereotactic frame 
as a reference. Frequently the patient must wear the frame for several days 
between imaging and surgery. The frames are a significant discomfort to the 
patient and are cumbersome to the surgeon, possibly limiting his flexibility 
during the procedure. 

A system which improves surgical precision, enables identification of brain 
structures, allows modification of the planned approach during the surgical 
procedure and does not require the use of a stereotactic frame would be a vast 
improvement over traditional neurosurgical procedures. 
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1.2 What is Enhanced Reality Visualization? 

Computer assisted surgery is a relatively new development which attempts 
to provide the surgeon with a tool to assist in the planning and execution of 
surgical procedures [Adams eta/., 1990, Lavalleeand Cinquin, 1990]. Image 
guided surgery is a specific type of computer assisted surgery which uses ad¬ 
vanced three dimensional visualization techniques to provide the surgeon with 
a wealth of valuable information not normally available in the operating room 
[Pelizzari dtai., 1991, Wellseta/., 1993, Black eta/., 1993, Grimson eta/., 1994], 
In essence, a complex surgical procedure can be navigated visually with great 
precision by overlaying on an image of the patient a color coded preoperative 
plan specifying details such as the locations of incisions, areas to be avoided 
and the diseased tissue. The process of aligning the preoperative plans with 
and over laying them on the patient is known as enhanced reality visualization. 
Enhanced reality visualization is the process of enhancing an image by adding 
information to it. The information added can be anything from text to icons or 
color coding to three dimensional surfaces. Figure 1-2 shows an enhanced re¬ 
ality visualization of a patient about to undergo neurosurgery. Thearea shown 
in ^ isthe tumor to beremoved and the ventricles are shown inH. 



Figure 1-2: Enhanced reality visualization showing atumor and ventricles. 
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1.3 A Scenario Which Utilizes Enhanced Reality 
Visualization 

1. A patient needing neurosurgery is scanned by one or more three dimen¬ 
sional, high resolution, internal anatomy scanners, such as magnetic res¬ 
onance (M R) or computed tomography (CT). 

2. Each internal anatomy scan is segmented by tissue type (white matter, 
gray matter, bone, etc). Thevariousscansofthepatientarealsoregistered 
with oneanother. The result is a three dimensional model of the patient's 
brain. 

3. Tools which identify, classify and label brain structures such as motor 
strip and speech centers are used to add the required detail to the model 
of the patient's brain. 

4. Once the model of the patient's brain has been constructed, enhanced re¬ 
ality visualization is possible. Enhanced reality visualization can be used 
to help plan the surgical procedure. Live video of the patient is mixed 
with information generated from the brain model allowing the surgeon to 
view the internal anatomy of the patient in a non-invasive manner. This 
capability allows the surgeon to test the feasibility of various possiblesur- 
gical approaches on theactual patient. Theenhanced reality visualization 
may be presented to the surgeon using one of several methods, such as a 
head-mounted display, a transparent projection screen or a surgical mi¬ 
croscope. Details regarding the surgical approach and procedure can be 
added to the brain model. 

5. The surgical procedureis performed using enhanced reality visualization. 
Enhanced reality enables the surgical site to be precisely located and 
facilitates accurate transfer of surgical plans to the patient. 


1.4 The Problem and Our Approach 

Enhanced reality visualization is an integral part of the image guided surgery 
paradigm, however compared with other aspects, I ittIeeffort has been expended 
on this area [Adams et a!., 1990, Lavallee and Cinquin, 1990, Pelizzari et a!., 
1991, Wells eta/., 1993, Grimson eta/., 1994]. In order to produce enhanced 
reality visualizations we must be able to quickly and accurately align informa¬ 
tion such as a brain model with an image. There are several other issues which 
must be addressed before a full function enhanced reality visualization system 
can be created. Some of these challenges are listed below. 
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Display method The displays currently available for enhanced reality visu¬ 
alization are less than optimal. Head mounted displays are still heavy, 
awkward and have relatively low resolution. Conventional CRT's have 
better resolution but limit the applications of enhanced reality visualiza¬ 
tion. 

Rendering The complexity of information and the detail and realism with 
which it can be rendered while updating at a reasonable frame rate are 
limited. 

Information acquisition Acquiring information and converting it to a form 
suitable for use in enhanced reality visualization can be a difficult and 
time consuming process. 

Virtual reality faces many of the same issues as enhanced reality visual¬ 
ization. There already exists a significant research effort in virtual reality 
examining these problems. Whilethere are many similarities, enhanced real¬ 
ity differs fundamentally from virtual reality in that it is anchored in the real 
world. Enhanced reality visualization must align the enhancement with a real 
image quickly and accurately. The ability to perform this alignment quickly 
and accurately is fundamental to enhanced real ity visual ization and wi 11 bethe 
focus of this report. Given a video image and a three dimensional model, the 
problem is to render the model from the correct viewpoint and overlay it on the 
original image. The relationship between the coordinate frames of the world, 
the model and the image plane of the camera must be established. This prob¬ 
lem is closely related to the camera calibration problem. Stated more precisely 
the problem is to: 

Deter mi ne the perspecti ve transformati on whi ch maps model coord 7 - 
nates to image coordinates in "real-time", allowing the information 
from the model to be added to an image in the correct location. 

An overview of the problem is shown in Figurel-3. Wesolvefor thetransforma- 
tion which maps model coordinates to image coordinates directly. An alternate 
approach solves for several transformations and then composes them into a 
single transformati on from model to image coordinates. For example, a refer¬ 
ence coordinate system could be defined for the physical object(s). Thefirst step 
might be to find the transformati on which aligns the model with the reference 
coordinate system. Next, the transformation from reference coordinates to im¬ 
age coordinates must be determined. Solving for intermediate transformations 
can introduce error into the solution and is computationally more expensive. 
Unless this data is needed there is no reason to break the problem into several 
pieces. 
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Enhanced Reality 
Visualization with 
Hidden Lines Added 


Figure 1-3: Overview of this report. 

We will define several terms that will be used throughout this report. An 
enhanced reality visualization iscomposedof a virtual imageoverlayedon a raw 
image. A raw image is an image of the real world prior to any enhancement. 
The coordinates of the raw image are referred to as image coordinates. The 
information which will beaddedtotherawimagetoproducetheenhancement is 
referred to as the model. The coordinates of the model are referred to as model 
coordinates. A virtual image is generated by rendering the model from a 
particular view point. The view point captures the relative placement and 
orientation between model and viewer. Finally, the term world coordinates 
is used to refer to an arbitrary coordinate system attached to an object visible 
in the raw image. 

Figure 1-4 shows an overview of our method in the context of neurosurgery. 
A novel formulation of the camera calibration problem allows us to quickly 
and easily obtain the perspective transformation mapping model (MR or CT) 
coordinates to image coordinates. The perspective transformation is then used 
to generate the enhanced reality visualization. Our approach utilizes several 
circular fiducials placed near the surgical site. 

The circular fiducials enable us to recover a scalefactor at each fiducial (the 
focal length of the camera divided by the depth of the fiducial). Given the scale 
factor as well as the image and model (MR or CT) coordinates for each fiducial, 
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Enhanced Reality 
Visualization of 
the Patient with 
the Brain Added 

Figure 1-4: Overview of this report showing the circular fiducial s. 



Laser Scanner Alignment 



Q) 


Model of Patient's 
Brain and Fiducials 


Figure 1-5: Determining the model (MR) coordinates of the fiducials. 
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the problem of determining the perspective transformation which maps model 
coordinates to image coordinates reduces to three sets of linear equations. I n 
general, the model coordinates of the fiducial s arenot known a priori, and they 
must be calibrated. Figure 1-5 depicts the process of determining the model 
(MR or CT) coordinates of the fiducials. During an initial calibration phase a 
laser scanner is used to register the world coordinates of the fiducials with the 
model coordinatesoftheMRorCT data. Oncetheinitial calibration iscomplete, 
our method is fully automatic and uses visual information exclusively. Some of 
the additional characteristics of our approach which make it particularly well 
suited to enhanced reality visualization are: 

1. Requires only a single image with a few fiducials 

2. Does not require internal calibration or known focal length 

3. Accurate to within a pixel 

4. Solution is non-iterative 

The remaining chapters of this report are organized as follows: Chapter 2 
reviews current enhanced reality visualization techniques. Chapter 3 devel¬ 
ops a basic camera model and contains a brief discussion of current camera 
calibration techniques. Chapter 4 presents our method and an overview of its 
implementation. Chapter 5 provides the details (theory, error analysis and 
empirical results) associated with circular fiducials. Chapter 6 discusses one 
method of calibrating fiducials. Chapter 7 shows the results of our method for 
a test object and a plastic skull. Finally, Chapter 8 presents our conclusions. 



Chapter 2 
Related Work 


Several groups of researchers have recently been investigating enhanced real¬ 
ity. The proposed applications for enhanced reality range from laser printer 
repair to aircraft manufacture. Current research efforts in enhanced reality 
visualization differ in many implementation details. The one thing they all 
havein common is the requirement to align a model with an image of the real 
world. In this chapter we will examine several different approaches. 


2.1 Medical Applications 

2.1.1 Aachen University of Technology 

A group at the Aachen University of Technology in Germany has developed 
a "Computer Assisted Surgery” module for use in ENT surgical procedures 
[Adams eta/., 1990]. A model of the patient is produced from presurgical CT 
scans. Radiopaque markings are attached to the patient's skull prior to the 
presurgical scans for use as reference points. The system is calibrated using 
a hand-guided electro-mechanical three dimensional coordinate digitizer. The 
digitizer is used to measure the positions of several reference points. With 
correspondence between digitizer and CT points the transformation from the 
CT coordi nates to di gi tizer coordi nates can be calculated usi ng 3D/3D match i ng. 
Duringan operation the surgeon can usethedigitizertopoint at an unidentified 
structure. Three perpendicular views of the CT data corresponding to the 
location of the digitizer are then displayed on a nearby CRT. The system must 
be recalibrated every time the patient moves with respect to the digitizer. The 
reported accuracy is better than ±lmm. 


2.1.2 TIMB 

A group at TIMB in Grenoble, France has developed a "Computer Assisted 
Medical Intervention” module [Lavallee and Cinquin, 1990]. A model of the 
patient's internal anatomy is produced from presurgical imaging. This system 
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uses a surgical robot or guidance system with modes that range from passive 
to semi-autonomous. In passive mode the robot provides visual feedback by 
overlaying the position of an instrumented probe on the presurgical scans. I n 
semi-autonomous mode some portions of the surgical procedure are performed 
by the robot under the supervision of the surgeon. The system is calibrated us¬ 
ing a special calibration cage made of four Plexiglas planes containing metallic 
balls. The calibration cage is placed near the patient and a pair of X-ray im¬ 
ages are taken. The relationship between the presurgical imaging, the X-ray 
device and the surgical robot can be established using 3D/3D matching. Again 
if the patient moves relative to the robot (or the X-ray device) the system must 
be recalibrated. Accuracy for the instrumented probe is reported as ~5mm. 
Accuracies for other modes are not reported. 

2.1.3 University of Chicago 

A group at the U niversity of Chicago has developed a method for "I nteractive 
3D Patient -1 mage Registration” [Pelizzari etal., 1991]. The method is used to 
position patients for radiation therapy. Again, a model of the patient's internal 
anatomy is produced from presurgical imaging. The model is used to plan the 
geometry of radiation therapy beams. Because of the non-invasive nature of 
radiation therapy it is difficult totransfer the beam geometry planned using the 
model to the actual patient. A Polhemus 3Space tracker and localizer are used 
asa magnetic3D digitizer to measure the surface of the patient. Themodel and 
the measured surface are then registered using 3D/3D surface fitting. Oncethe 
registration has been performed, the magnetic digitizer is used again to mark 
the patient for setup. The intersections of the three principle planes with the 
patient are traced. These marks are then used as reference for positioning 
the therapy machine. The therapy machine must be aligned manually. If 
the patient moves the entire calibration need not be reperformed, however the 
therapy machine must be realigned with the reference marks on the patient. 
Accuracies of ~ 1mm and ~1° are reported. 

2.1.4 Massachusetts I nstitute of Technology 

A group at MIT's Artificial Intelligence Laboratory has developed "An Auto¬ 
matic Registration Method for Frameless Stereotaxy, Image Guided Surgery, 
and Enhanced Reality Visualization”[Grimson etal., 1994]. As in the previous 
work described, a model of the patient's internal anatomy is produced from 
presurgical imaging. A Technical Arts laser range scanner is used to collect a 
set of 3D data points from the patient's skin surface. The model and the laser 
data are registered using 3D/3D surface matching. A special calibration object 
is used to calibrate the laser scanner and calibrate the location of a camera on 
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the laser scanner. Once the model is registered with the laser data and the 
location of the laser scanner camera is calibrated, the model can be overlayed 
with video of the patient. The calibration must be reperformed if the patient 
or camera move and the overlay can only be generated for a single viewpoint. 
The reported accuracy of this method is~lmm. 


2.1.5 Stanford 

A group at Stanford University has developed 'Treatment Planning for a Ra- 
diosurgical System with General Kinematics" [Schweikard etal., 1994]. The 
method is used to plan and perform radiosurgery. A model of the patient's 
internal anatomy is produced from presurgical imaging. The radiosurgery is 
planned using the model. In addition, the model is used to synthesize ra¬ 
diographs from different view points. A total of over 400 such radiographs 
are produced. During the radiosurgery, two nearly orthogonal X-ray images 
of the patient are taken and compared with the precomputed radiographs to 
determine the patient's position and orientation with respect to the treatment 
machine. The patient's position and orientation can be determined about twice 
per second. Thetreatment machine (X-ray system, radiation source, etc.) must 
be calibrated separately using a special calibration routine. The accuracy of 
this method is not reported. 


2.1.6 University of North Carolina 

A group at the University of North Carolina has developed a method for "M erg- 
ing Virtual Objects with the Real World" [Bajura et a/., 1992]. This system 
allows the user to see ultrasound imagery overlaid on a patient in near real¬ 
time. A six degrees of freedom (DOF) Polhemus 3Space tracker is mounted 
on the probe used to acquire ultrasound images. A second 6DOF tracker is 
attached to the head-mounted display (FI M D) used to view the overlay. I mages 
from a camera also mounted on theFIMD are combined with the ultrasound im¬ 
ages to produce the overlay. Since both trackers report position and orientation 
it is a simple matter to transform between ultrasound tracker coordinates and 
FIMD tracker coordinates. In order to transform ultrasound images into the 
coordinate system of the FIMD camera the relationships between ultrasound 
i mages and the ultrasound tracker and the FI M D camera and the FI M D tracker 
must be establ i shed. A sped al cal i brati on j i g i s used to deter mi ne these trans- 
formations periodically. This system allows for motion of both the user and the 
patient. The accuracy of the overlay is not reported. 
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2.2 Other Applications 

2.2.1 Boeing 

A group at Boeing has developed "An Application of Heads-Up Display Tech¬ 
nology to Manual Manufacturing Processes" [Cau del I and Mizell, 1992]. The 
goal of this work is to overlay manufacturing instructions on images of the 
manufacturing process and display them to a worker using an HMD. The in¬ 
structions to be overlayed are derived from a CAD model. The system uses a 
6DOF Polhemus 3D Isotrack magnetic tracking system attached to the HMD 
to generate the overlay. A calibration jig is used to establish the relationship 
between the HMD and the work site. Given the relationship between the HMD 
and the work site an overlay can easily be generated. The accuracy of this 
system is not reported. 


2.2.2 Columbia University 

A group at Columbia University has developed a method for "Knowledge-Based 
Augmented Reality" [Feiner et al., 1993]. The goal of this work is to overlay 
instructions for repairing a laser printer with images of the laser printer. The 
instructions are derived from a knowledge-based system. A Logitech 3D ultra¬ 
sonic tracking system and an Ascension Technology magnetic tracking system 
are used to determine the position and orientation of an HMD and several key 
parts of the laser printer. Using the position and orientation information from 
the tracking system an instruction overlay is generated and displayed to the 
user via the HMD. Frame rates of about 15hz are reported. Accuracy is not 
reported. 


2.3 Discussion 

As discussed in Section 1.4 the transformation which maps model coordinates 
to image coordinates can be divided into several pieces. All of the methods 
discussed above take this approach and all of them calculate a transformation 
which registers the model with some world (reference) coordinate system. Ini¬ 
tially, our discussion will consider only this piece of the larger transformation. 

Current methods of registering the model with the world coordinate system 
are somewhat I i mited. M any of the approaches use magnetic trackers to deter¬ 
mine the transformation which will align the two coordinate frames. Magnetic 
trackers have several significant shortcomings which limit their effectiveness 
for enhanced reality visualization. Magnetic trackers havea very short range, 
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typically a few feet. Accuracies are limited to ~6mm and ~1.5 0 . 1 Perhaps the 
most significant limitation is that magnetic and metallic objects can severely 
degrade the performance of magnetic trackers. This is a significant limitation 
for surgical applications as most operating rooms are loaded with metallic and 
magnetic objects. In addition, current advances in intra-operative imaging 
have made it possible to obtain MR images of a patient's internal anatomy 
during surgery. This environment precludes the use of magnetic trackers. 

Several other types of sensors arealsoused todeterminethetransformation 
which will register the model with the world coordinate system. These sensors 
also have I imitations which makethem less than desirablefor enhanced reality 
visualization. Ultrasonic trackers have rangeand accuracy limitations similar 
to those of magnetic trackers. While they are not susceptible to magnetic 
interference they are limited to line of sight operation. Mechanical digitizers 
have good accuracy but are cumbersome and have limited range. The laser 
scanner provides very accurate position information but also has a short range 
and is limited to line of sight operation. 

Al I of the methods cited above use active sensors to col I ect the data requi red 
to register the model with the world coordinate system. This requires either a 
special environment (mechanical digitizers, magnetic trackers and ultrasonic 
trackers) or a cumbersome piece of equipment (laser scanner and X-ray). There 
are also many situations where active emissions are not desirable. 

The registration produced by most of the medical applications (the excep- 
ti ons arethe work bei ng done by the groups at the U ni versity of N orth Carol i na 
and Stanford University), is limited to a fixed patient and sensor configuration. 
They do not allow for patient motion. This is because the alignment between 
the pati ent and the worl d coordi nate system i s i mpl i ci tl y deter mi ned duri ng the 
initial calibration phase and is not monitored. It is not clear that it is possible 
to extend these methods to allow for motion. [Grimson etal., 1994] claim that 
their method is extensible to cover patient motion, however it is not clear that it 
is practical or possibl e to do so using a laser scanner. In a surgical environment, 
with all but the surgical site hidden under sterile drapes, it is doubtful that 
enough patient surface area will be visible to the scanner to allow registration 
of the patient and model. In addition, using a laser in the operating room might 
be distracting to the surgeons. 

These methods also require a high degree of operator action to register the 
model with the patient. Typically the operator must measure data points by 
hand or edit data that was semi-automatically collected. At least one of the 
methods requires about half an hour to produce a single registration. 

While all of the medical applications register a model obtained from presur- 
gical imaging to a world coordi nate system, only two of the applications (Mas¬ 
sachusetts Institute of Technology and University of North Carolina) actually 


1 These accuracies are for a sensor located between 10 and 70cm from the source. 
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produce enhanced reality visualizations. Neither of these methods can handle 
dynamic changes in focus, zoom or aperture of the camera used to obtain the 
raw image for the enhanced reality visualization. And only the method pro¬ 
duced by the group at the University of North Carolina allows the surgeon to 
change the vi ewpoi nt of the enhanced real ity vi sual izati on. 

None of the current approaches to enhanced reality visualization arepartic- 
ularly well suited to a surgical environment. The sensors used are active and 
arefrequently cumbersome. A good solution for a surgical environment should 
allow for patient motion and should allow the surgeon to see enhanced reality 
visualizations from different view points. It should be fully automatic follow¬ 
ing a straight forward initial calibration. And the method should allow for 
dynamic changes in the camera parameters. We will consider optical sensors 
for a number of reasons. Small, light weight and inexpensive video cameras 
are readi I y avai I abl e. Very accu rate resu Its can be achi eved with these cameras 
which can operate over long ranges. Further, we will obtain the information 
requi red to register the model with the raw i mage enti rely from the raw i mage. 
This has the advantage of ensuring that registration information is available 
exactly when raw images are availableto produce an enhanced reality visual¬ 
ization. Si nee the goal of enhanced reality visualization is to enhance an image, 
the raw image will likely contain a lot of information valuable to performing 
the enhancement. Almost all of the current approaches ignore the information 
contained in the raw image opting for what is essentially a closed loop solution. 

RecentlyfWellseta/., 1993] proposed a method of enhanced reality visualiza¬ 
tion using video information exclusively, however the method requires manual 
alignment of the model with the video image and in some cases requires mark¬ 
ers to appear in both the MR image and videoimage. An optical tracker has also 
been proposed by a group at the University of North Carolina [Wang eta/., 1990, 
Wardeta/., 1992,Gottschalkand Hughes, 1993,Azuma and Biship, 1994]. This 
method is essentially another active sensor not unlike a laser scanner. It uses 
a "sea-of-lights” consisting of nearly 1000 LED's mounted in the ceiling tiles 
of 10' by 12' room. Three cameras mounted on an HMD are aimed at the ceil¬ 
ing whilethe LED's are flashed sequentially. This "optical tracker” requires a 
special ceiling which must be calibrated and a significant amount of additional 
hardware (3 extra cameras, LED control, etc). Neither of these, proposals are 
suitable in our application for the reasons cited above. 



Chapter 3 

Camera Calibration 


3.1 Camera Model 

The pin-hole camera is frequently used to model the transformation from world 
coordinates to image plane. The pin-hole model uses the perspective projection 
model of image formation. Orthographic projection with scale or weak perspec¬ 
tive is used in many computer vision applications, however it is not accurate 
enough across the entire image for our application. For example consider an ob¬ 
ject with 5cm of depth located 100cm away from the camera and 5cm off-axis, 
see Figure 3-1. Under orthographic projection points a and b are collocated, 
however in a real image(usinga 25mm lens) the points are 5 pixelsapart. The 
effect is significant even with the object only slightly off center. The left side 
of Figure 3-2 shows a camera centered Cartesian coordinate system. The optic 
axis of the camera is coincident with the ~ axis. The image plane is parallel to 
the xy plane and located a distance / from the origin. Even though the image 
plane is not required to be parallel to the xy plane, most camera models do not 
explicitly consider this possibility. I mage plane pitch 0 X and tilt 9 y are usually 
reflected in pose. We will start with the assumption that 8 X = 9 y = 0 and then 
in Chapter 4wewill show how our method implicitly models image plane pitch 
and ti 11. The poi nt where the i mage pi ane and the opti c axi s i ntersect i s known 
as the principal point. Under perspective projection a point P c = [r c y c ~ c ] 
projects to point p = y] on the image plane by the foil owing equations 1 : 

•r = /- (3.1) 

~c 

y = f- (3.2) 

~C 

Unfortunately, we are not ableto directly access the image plane coordinates. 
Instead we have access to an array of pixels in a frame buffer or computer 

1 We represent points as row vectors rather than column vectors. This means that points 
will be premultiplied instead of postmultiplied when applying a transformation. This is the 
exact opposite of what is typically used in computer vision, however it is the notation that the 
author was first exposed to and what has stuck. 
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5 pixels 

Figure 3-1: Effect of orthographic projection assumption. 



Figure 3-2: Transformation from world coordinates to the image plane of an 
ideal pin-hole camera. 
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memory. In order to understand the relationship between image plane coor¬ 
dinates and the array of pixels, we must examine the imaging process. The 
image plane of a CCD camera is a rectangular array of discrete light sensitive 
elements. The output of each of these elements is proportional to the amount 
of light which falls on it. The values for each of these elements are read out one 
element at a time row after row until the entire sensor array has been read. 
This analog signal is processed by a frame grabber which converts the camera's 
output to digital values and stores them in the frame buffer. The result is an 
array of digital values in the memory of the computer. The rows of the array 
correspond to the rows of the image sensor. In general, this is not the case 
for the columns. A synchronization signal is provided between rows, however 
the frame grabber samples the signal within a row asynchronously and at an 
independent frequency which may result in a different number of columns per 
row than were present in the camera. Synchronization errors can also cause 
the rows to not line up. This is known as pixel jitter and in extreme cases can 
cause the x and y axes to appear non-orthogonal or skewed [Lenz and Tsai, 
1988]. Most camera models omit skew angle 0^ (theangle between the.r and y 
axes minus 90°). We will start with the assumption that 0 X y = Ofor simplicity 
and then in Chapter 4 we will show how our method implicitly models skew 
angle. A single element of the array in memory is commonly called a pixel 2 . 
We will refer to the row and column number of a given pixel as y' and a-' respec¬ 
tively. Several parameters are defined to quantify the relationship between 
the array in memory and the coordinate system of the image plane. x 0 and yo 
are the pixel coordinates of the principal point. .s x and » y are the number of 
pixels in memory per unit distance in the x and y direction of the image plane. 
These parameters along with /, 0 X , 0 y and 0 ^ are intrinsic or internal camera 
calibration parameters. The projection of point P c to point p' = [.(•' //] in memory 
is described by the foil owing equations: 


/ r TC 

X - .I Q = JS X — 

~c 

(3.3) 

U | U 

> 

<0 

II 

o 

1 

(3.4) 


The right half of Figure 3-2 shows an arbitrary Cartesian coordinate system 
which we will refer to as the world coordinate system. A point P w in the world 
coordinate system is transformed into the camera centered coordinate system 


2 As noted above pixels in memory can differ in size from the underlying image sensing 
elements. Many of the measures of accuracy cited both in this work and in the literature 
should actually be made relative to the image sensing elements. This is straight forward for a 
calibrated camera. We are working with uncalibrated cameras and the relationship between 
image sensing elements and pixels in memory isfrequently not known. Thus for simplicity and 
in spiteof the preceding, we will express our measures in terms of pixels in memory. 



30 


CHAPTER 3. CAMERA CALIBRATION 


by the following equation: 

P C = P W K + T (3.5) 

where TZ is an orthonormal rotation matrix and T is a translation vector. TZ 
and T together are commonly referred to as the extrinsic or external camera 
parameters. 

The pin-hole model assumes an ideal camera. Because of lens distortion, 
real cameras deviate from ideal. The major categories of lens distortion are: 

1. Radial distortion - the path of a light ray traveling from the object to the 
image plane through the lens is not always a straight line. 

2. Decentering distortion - the optic axis of individual lens components are 
not always col I inear. 

3. Thin prism distortion - the optic axis of the lens assembly is not always 
perpendicular to the image plane. 

When it is necessary to explicitly model lens distortion it is typically sufficient 
to model only radial distortion. Our method, developed in Chapter 4, implic¬ 
itly models a linear approximation to lens distortion. 3 Using a 25mm lens of 
average quality we have not found it necessary to explicitly model lens distor¬ 
tion. The maximum radial distortion at the extreme edge of the image for our 
configuration is just a few pixels. 


3.2 Current Camera Calibration Techniques 

Research into the camera calibration problem has a long history originating in 
the field of photogrammetry. For a more complete discussion of camera calibra¬ 
tion techniques see[Slama, 1980, Tsai, 1987]. Camera calibration techniques 
can be divided into three different categories: 

1. Methods which recover only intrinsic parameters. These methods gen¬ 
erally require a special calibration object or stand to allow the internal 
parameter(s) to be measured independent of other parameters. Also, 
these methods assume that the intrinsic parameters do not change. Un¬ 
less extreme care is taken to ensure otherwise, it is almost certain that 
the intrinsic parameters will change perhaps by a significant amount fol¬ 
lowing calibration. Examples of these methods include [Brown, 1965, 
Lenz and Tsai, 1988, Maybank and Faugeras, 1992]. 

3 A discussion of this characteristic can be found in Appendix A. 




3.2. CURRENT CAMERA CALIBRATION TECHNIQUES 


31 


2. Methods which recover only extrinsic parameters (also known as geomet¬ 
ric methods). These methods assume that the intrinsic camera param¬ 
eters are precisely known. This is not always possible for the reasons 
mentioned above. Examples of these methods include [Church, 1945, 
Fischler and Bolles, 1981]. 

3. Methods which recover both intrinsic and extrinsic parameters. These 
methods can be further divided into two categories: 

(a) Nonlinear optimization methods. These methods are both nonlinear 
and iterative. These methods typical ly producethe most accurate re¬ 
sults but requirea large number of features and a significant amount 
of time. Further they are frequently not automatic and need a good 
initial solution to ensure convergence. Finding an initial solution 
can be a difficult problem. Examples of these methods includelFaig, 
1975, Sobel, 1974], 

(b) Linearization methods. These methods linearize the nonlinear pro¬ 
jection equations by introducing additional constraints. The basic 
difference between members of this category is how the problem is 
linearized. If care is not taken when the equations are linearized 
significant bias can be introduced. Many of these methods are it¬ 
erative and under certain conditions fail to converge. These meth¬ 
ods tend to be significantly faster than the nonlinear optimization 
methods. They frequently do not require an initial solution or can 
calculate one with relative ease. These methods still requirea large 
number of points for good results. Examples of these methods in¬ 
clude [Faugeras and Toscani, 1987, Grosky and Tamburino, 1987, 
Tsai, 1987, Goshtasby, 1987, Ganapathy, 1984], 

None of the current solutions to the camera calibration problem are ide¬ 
ally suited to enhanced reality visualization. The linearization methods come 
closest to meeting the requirements of enhanced reality visualization, however 
there is room for improvement. 
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Chapter 4 
Our Solution 


We have developed a novel method for determining the relationship between 
model and image coordinates. Figure 4-1 provides an overview of our method. 
Two key insights lead to a significant simplification of the problem. These 
insights are: 

1. It is not necessary to separate the intrinsic and extrinsic parameters for 
enhanced reality visualization. 

2. It is possible to recover depth information from a single 2D image. 

Utilizing these insights produces a solution that is particularly well suited 
to enhanced reality visualization and meets the requirements discussed in 
Chapters 1 and 2. Our solution is most closely related to the linearization 
methods described in Chapter 3. 



the Brain Added 

Figure 4-1: Overview of this report showing the circular fiducials. 
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4.1 A Perspective Transformation is Enough 

The camera calibration problem is typically posed as follows: 

1 = MXC (4.1) 

where: 


n ' 
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M is a matrix of model points of the form [X Y Z 1], / is a matrix of image points 
of theform |> y ~] resulting from the projection of M onto the image plane, X is 
an external camera calibration matrix, TZ is an orthonormal rotation matrix, T 
is a translation vector, and C is an internal camera calibration matrix. I mage 
points are expressed in homogeneous coordinates to allow the perspective pro¬ 
jection to be captured using linear equations [Duda and Hart, 1973]. The pixel 
coordinates of an image point [x' //] are determined by the following relation¬ 
ships: x 1 = x/z and //' = y/z. These relationships are very similar to (3.3) and 
(3.4). I n fact, x, y and ~ are analogous to the camera centered coordinates x c , y c 
and z t . 

The ultimate goal of most camera calibration is to enable the recovery of 
metric 3D information, such as the pose (position and orientation) of an object, 
from its two dimensional image. Clearly, to recover the pose of an object it 
is necessary to separate the intrinsic and extrinsic parameters. Separating 
the parameters is difficult [Ganapathy, 1984]. The problem is nonlinear and 
several of the parameters are closely coupled. In the presence of noise a single 
solution to the camera calibration problem does not exist, rather there exists 
a set of solutions. These solutions can differ significantly and yet give rise 
to nearly identical images. For example, in the presence of noise significant 
trade offs can be made between t z and /. This can result in a solution which 
looks good from one view point but where neither the intrinsic nor extrinsic 
parameters are correct. The fact that the optimal solution for one view point 
may not be the globally optimal solution is at the heart of what makes camera 
calibration hard. 
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Camera calibration is often performed as a preliminary step in many appli¬ 
cations. A set of camera calibration parameters is recovered and the intrinsic 
parameters are used for future images. This assumes that a globally optimal 
set of intrinsic parameters has been recovered and that the parameters are 
fixed. By using multiple images or multi pie calibration planes the solution can 
be improved in a global sense. However, in the presence of noise small errors 
can lead to large errors for view points significantly different than those used 
during calibration. Further, the intrinsic camera calibration parameters are 
not fixed. They change with the focus and aperture settings. For example, the 
principle point can shift by 8 pixelsor morewith adjustments to focus [Will son 
and Shafer, 1993]. The effective focal length / also varies with focus and aper¬ 
ture settings. Zoom lenses take this variability to an extreme, enabling large 
changes to /. Lens distortion also varies with changes to focus and aperture 
[Brown, 1965]. 

In enhanced reality visualization we are interested in the total transforma- 
ti on from model to i mage coordi nates. We do not need to separate i ntri nsi c and 
extrinsic parameters to generate an enhanced reality image. All of the param¬ 
eters compri si ng al I of the i ntri nsi c and extri nsi c cal i brati on parameters can be 
composed into a single 3x4 matrix. This insight is not new, but how we apply 
it is. The following matrix equation is equivalent to (4.1) and the combination 
of (3.3), (3.4) and (3.5). 


where: 


I = MV 


(4.2) 


V = xc 


rns x f + r 23 x 0 n 2 Syf + /' 13//0 r 13 
r 21 s x.f + r 23 x 0 r 22 s yf + ?’232/0 ?’23 
?’3l Sxf + r 33 .ro , ' 32 s y/ + r 33 ;yo r 33 
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(4.3) 


For our purposes finding values for the elements of V is sufficient. 

A more general internal calibration matrix can be defined as follows: 


C 11 

C 12 

C13 

C 21 

C 22 

C23 

C31 

C32 

c 33 


(4.4) 


This new definition of C has 9 degrees of freedom. These degrees of freedom 
correspond to x 0 , yo, s x , s y , f, 6 Xy , 0 X , 0 y and 9 Z . Only 5 of these 9 degrees 
of freedom are unambiguous. s x , s y and / actually constitute 2 degrees of 
freedom. This is equivalent to saying that C is only defined up to a scale factor. 
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9 X , O y and 0 Z are redundant degrees of rotational freedom which are captured 
in X. Therefore the internal calibration matrix used in (4.1) needs only slight 
modification: the addition of a skew parameter 0 X y . The skew parameter is 
added in the 1 st column, 2 nd row of C. The value of the skew parameter is 
equal to tan 0 X y or change in the x coordinate based on the y coordinate. With 
the addition of this parameter to C the perspective transformation (XC) matrix 
becomes: 


’ rns K f + r 12 tan 9 X y + / I 3 .r 0 r 12 s y f + ?’i3j/o r 13 ' 

,p = r 21 s xJ + r 22 tan 9yy + i'23 J '0 r 22 s y.f + ''23//0 ?’23 ^ 5) 

r 3l s xf + ?’32 tan 6»xy + r 33 x 0 r 32 s y f + r 33 t/ 0 ?’ 33 

tx s x.f + tan 0xy + ^z-?’0 ty s y.f + t z yo t z _ 

V can exactly model non-orthogonal frame buffer axes {9 Xy ± 0) and per¬ 
spective projection with the image plane not perpendicular to the optic axis 
(0 X and/or 0 y ± 0). A linear approximation of radial distortion can also be 
obtained 1 . Not ice that the revised definition ofC has 5 degrees of freedom and 
when combined with the 6 degrees of freedom contained in X accounts for all 
11 degrees of freedom in V. Thus V implicitly models 5 intrinsic and 6 ex¬ 
trinsic parameters. The modeling is implicit because the underlying physical 
parameters are never actually computed. By formulating the problem in this 
manner, we avoid the difficulties associated with decomposing the intrinsic and 
extrinsic camera parameters. We solve (4.2) for each image we obtain. Thus we 
do not need to worry about finding a globally optimal solution, optimal for this 
view point is sufficient. Further, changes to the intrinsic camera parameters 
are inherently captured. 


4.2 Depth I nformation F rom a Si ngle 2D I mage 

Even with the simplifications made so far, the problem of solving for the per- 
spectivetransformation which maps model coordinates to image coordinates is 
still nonlinear. While it is possibleto solve for the elements of V using a mini¬ 
mum of 6 point features and an iterative method, we can do better. Expanding 
(4.2) produces: 


x' = x/z 

pnX + p 21 Y + p 31 Z + p4i 

P13X + p 23 Y + p 33 Z + p 43 


(4.6) 


1 A discussion of this characteristic can be found in Appendix A. 
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y' = y/ z 

pi 2 X + P22Y + P32Z + p 42 ^ ^ 

pi 3 X + /^23 V' + P 33 Z + P43 

If we knew the value of ~ = /;i 3 A’ + m 3 V + P 33 Z + pa 3 then solving for the elements 
of V would reduce to 3 sets of linear equations. Using spatial features, rather 
than point features, enables us to measure a quantity that is proportional 
to ~ for each feature. We call this quantity the local scale factor. The local 
scale factor is equal to the focal length of the camera divided by the depth of the 
feature(s y //~). Weusea circular fiducial as our spatial features [Landau, 1987, 
Thomas and Chan, 1989, Hussain and Kabuka, 1990, Chaudhuri and Samanta, 
1991, Safaee-Rad et al., 1992]. The exact nature of these fiducials will be 
discussed in Chapter 5. In essence, by using a spatial feature we are able to 
recover 2^D information from a single video image. The idea of using spatial 
features is not new, however our use of the information provided by spatial 
featuresis. We will modify our matrix equation slightly by multiplying/andP 
by Xj. Since/ isexpressed in homogeneous coordinates, multiplying/and/orP 

by an arbitrary constant has no effect on the solution {XjV and V represent the 
same solution). We will refer to the elements of -XT as p tJ anddefineJ'= Xj. 
T is a matrix of image points of the form [.r y* 1/s]. $ is the local scale factor at 
the image point. The pixel coordinates of an image point [x' //] are determined 
by x 1 = sx* and y 1 = sy *. x*, y* and 1/s are similar to the homogeneous image 
coordinates defined in (4.1). The elements of V can be solved for using the 
foil owing three sets of linear equations and as few as four spatial features. 

•?’/ = [puXi +p2iYi +P3iZ t +P4i) (4.8) 

yt = (P12 Xi + I >22 y'i + P3lZi + P 42 ) (4.9) 

1 1 Si = {imXi -\-p 23 Xi +P33Z t -\-p 43 ) (4.10) 

Where a§, y* and 1/s,- are the components of the ? th image point and X t , Y t and 

Z t are the components of the ?: th model point. 


4.3 I mpl ementati on 

11 should be noted that by usi ng a spati al featurethe probl em of determi ni ng the 
relationship between model coordinates and image coordinates becomes linear 
and the solution can be found using as few as four features. Also, since all of 
the calibration parameters (both intrinsic and extrinsic) are bundled intoP we 
are not required to make any assumptions about the stability of the intrinsic 
parameters. This is important in dynamic environments because changes tothe 
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focus, aperture and/or zoom will likely be needed during the enhanced reality 
visualization. 

Given a model and an image containing at least four fiducials with known 
model coordinates it is a relatively straightforward task to solve for V and 
generate an enhanced reality image. The basic steps are as follows: 

1. Grab an image 

2. Locate thefiducials in the image and calculate the local scalefactor 

3. Establish correspondence between fiducials in the image and fiducials in 
the model 

4. Solve for V 

5. UsingP, project the model into the image 

6. Display the enhanced reality image 

7. Repeat 

4.3.1 Image Acquisition 

720 x 480 pixel images are acquired using a PulnixTMC-50 color CCD camera 
or a Panasonic WV-CD50 monochrome CCD camera with either a 25mm or 
16mm lens or a Cl DTE K monochrome CID camera with a 29mm lens and a 
Sun VideoPix frame grabber. Both the cameras and frame grabber are rela¬ 
tively inexpensive commodity items. VideoPix is only capable of grabbing ~4 
monochrome or ~1 color frame per second. This severely limits the update rate 
of the enhanced reality visualization. Furthermore, VideoPix only provides 7~ 
bits of luminance information. Even though pixel values range from 0 to 255 
the actual resolution is less than half of this range. We intend to upgrade to 
a better camera/frame grabber combination in the future, however the current 
combination is sufficient to demonstrate our method. 

4.3.2 Fiducial Location 

The location (centroid) and local scalefactor (semi-major axis of the fiducial's 
image divided by the radius of the fiducial) are calculated using moments. 
Chapter 5 describes these calculations in detail. 

4.3.3 Correspondence 

Once an initial correspondence has been established, it should be possible to 
maintain correspondence by tracking the fiducials. The idea is that if images 
can be processed fast enough the locations of the fiducials should not change 
very much. Given the correspondence from the last image, we look for fiducials 
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in the new image within a small region around each fiducial's last location. If 
exactly one fiducial is found in that region then the correspondence for that 
fiducial is maintained. If at least some minimum number of correspondences 
(> 4) are maintained, then the fiducials have been successfully tracked. If 
correspondence is lost or no previous correspondence exists than an initial 
correspondence must be established. The initial correspondence is performed 
using a modified version of the alignment method [Huttenlocher, 1988]. The 
alignment method is modified to use scale information as well as some orien¬ 
tation constraints to significantly prune the search space. The three major 
constraints used are listed below: 

• Each fiducial is visible from only one side. Specifically, the dot product 
of fiducial's normal and the viewing direction must be negative or the 
fiducial is definitely not visible. 

• The local scale factor s, establishes the relative depth of the fiducials up 
to the accuracy of the measurement. 

• The local scale factor sf Is used to effectively unproject the image point 
[xj y'j\. Recall that xj = xjs t and y[ = yjs^ If C is close to a diagonal matrix 
or if a reasonable guess exists for at least some of the intrinsic camera 
parameters , 2 then x* and y* can be treated as the x and y components 
of the camera centered coordinates for the ?: th point. Since scaling is not 
allowed between world coordinates and camera centered coordinates any 
transformation between the two must have a scalefactor close to unity. 

The alignment method uses triples of model and image points to generate 
possible transformations from model to image coordinates. There are a total 
of ( 3 ) ( 3 ) 3 ! different sets of triples where m is the number of model points 
and i is the number of image points. In general the alignment method pro¬ 
duces 2 solutions for every set of model and image points. This is because the 
alignment method assumes orthographic projection and is therefore unable to 
resolve reflections about the xy plane. For an image and a model both con¬ 
taining 7 points the alignment method generates ss 15,000 possible solutions. 
Utilizing the constraints listed above significantly reduces this number. For 20 
random views of an object with 7 fiducials, the number of possiblesolutions was 
reduced from 15,000 to an average of 100. F or one of the vi ews the constrai nts 
reduced the number of possible solutions to 3. Some of these views are shown 
in Chapter 7. The constraints areapplied using only theset of three image and 
model points and before the remaining model points are transformed or global 
constraints are checked. This reduces the computational cost of establishing 
correspondence for an object with 7 fiducials by over 2 orders of magnitude. 

2 I n our experience only *0 and t/o need to be estimated and it is sufficient to use the geometric 
center of the image plane as a fixed estimate of their values. 
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Pseudo code for establishing correspondence follows: 

1. If correspondences > minimum => done. 

2. IfO < correspondences < minimum => establish the required additional 
correspondences. 

3. Otherwises establish correspondences. 

(a) Find candidate transformations from model to image points. 

i. Take each possible triple of model points and pair it with each 
permutati on of three i mage poi nts. 

ii. For each group of model and image points check that the model 
points are consistent and determine from which side they are 
visible. I) and n t are the location and normal of the / th model 
point in the triple. 

• Find the normal to the triple 

n p =(P 2 -Pi)x(P 3 -P 1 ) 

• Verify that the model points can be visiblesimultaneously 

SIGN(?? P x ??i) = SIGN(?? P x n 2 ) = SIGN(?? P x 113 ) 

iii. Transform (rotate and translate) the model points of the triple 
so that the first point is at the origin and the points lie in the xy 
plane. Therearetwotransformations which will accomplish this, 
choose the one which will makethe model points right sideupas 
determined in Step3(a)ii. Call the transformed model poi nts M*. 

iv. Unproject the image poi nts of the triple and translate so that the 
first point is at the origin. Call the transformed image points P. 

v. Calculatethetransformation(s) X which maps M* toP using the 
alignment method. 

vi. Check that the solution computed in Step 3(a)v is consistent. 

• Use relative depth constraints to eliminate one of the two 
solutions. 

• Verify that the solution does not turn the model points upside 
down. Step 3(a)ii ensured that the model points were right 
side up. As long as the transformation calculated in Step 
3(a)v does not rotate more than 90° about any axis in the xy 
planethe model points will remain right sideup. ~ istheunit 
vector in the ~ direction. 


SIGN( \\zX \\) > 0 
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• Verify that the scale factor is close to unity 

vii. If the transformation computed in Step 3(a)v is consistent, com¬ 
pose the total transformation using the transformations from 
Steps 3(a)iii and 3(a)v and the translation from Step 3(a)iv. 

(b) For each consistent transformation determine correspondences be¬ 
tween the remaining image and model points and calculate the cost. 

i. Transform the remaining model points into camera centered co¬ 
ordinates. 

ii. For each transformed model point 

A. Find the closest image point 

B. If the closest image point is close enough, add the correspon¬ 
dence to the match and add the Euclidean distance squared 
to the total cost. 

iii. Return a match consisting of a list of correspondences and the 
total cost. 

(c) Consolidate matches and find the best one. 

i. For each unique list of correspondences create a consolidated 
match consisting of: 

• The list of correspondences. 

• The number of matches containing the list of 
correspondences. 

• The minimum cost among the matches containing the list of 
correspondences. 

ii. Return the best consolidated match which is the one with the 
largest number of correspondences or the largest number of 
matches or the lowest cost, in that order. 

Thealgorithmused in Step 2 to establish partial correspondences is very similar 
to that described in Step 3. If less then three correspondences exist, additional 
pai rs of model and i mage poi nts are combi ned with the exi sti ng correspondences 
to form sets of three model and three i mage poi nts. I f three or more correspon¬ 
dences exist then triples of the established correspondences are used to form 
the sets. The rest of the algorithm is unchanged. The ability to perform partial 
correspondence greatly si mpl ifies the probl em of reestabl i shi ng correspondence 
when most but not all of the fiducials are temporarily occluded. If at least the 
minimum number of correspondences are maintained partial correspondence 
is not used to find correspondences for fiducials that were occluded but are now 
visible. This case is handled nicely as part of locating the fiducial s described in 
Chapter 5. 
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4.3.4 Solvi ng for V 


Equations (4.8), (4.9) and (4.10) are used to solve for the elements of V. If 
correspondences have been established for more than four fiducials than an 
over-determined system of linear equations exists. We solve this problem in 
a least-squares fashion by minimizing the following error terms using House¬ 
holder's QR decomposition [Watkins, 1991]. 



? ’ 2|| 2 


?, 3|| 2 



£ ix 
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(/'ll X + P21V; + pilZi + p 4i)| 
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{pi2 X, + I>22 y'i + I>32 Z; + p 4 2 ) | 

|2 


(P13 + /; 2 3 V'; + + /M3) 


(4.11) 

(4.12) 

(4.13) 


Householder's method exhibits good stability and is relatively efficient. Com¬ 
puting the QR decomposition requires approximately nm 2 - m 3 /3 flops and 
using the decomposition to solve for the unknowns requires approximately 
2 nm - in 2 / 2 flops where n is the number of unknowns and m is the number of 
equations. Splitting the problem into three sets of equations has a significant 
computational advantage. Equations (4.8), (4.9) and (4.10) can be rewritten in 
matrix form as follows: 



= MV i 

(4.14) 

X* 

= mv 2 

(4.15) 

,5' 

= mv 3 

(4.16) 


X% Y* and S arecolumn vectors whose/ th components are.r*, y * and l/s t respec¬ 
tively. M is a m x 4 matrix whose/ th row is the / th model point [X, Y t Z t 1]. V t is 
the ? th column ofP. Matrix M is decomposed intothe matrices Q and R. Since 
M is common to all three equations we need only perform the decomposition 
once. We can simply reuse the decomposition for the remaining equations. In 
essence, you pay for solving (4.14) and you get the solutions to (4.15) and (4.16) 
for very little. Formulating the problem as three sets of linear equations each 
with 4 unknowns reduces the complexity of computing the decomposition by an 
order of magnitude and solving for the unknowns by half an order of magni¬ 
tude compared to solving a single set of linear equations with 12 unknowns. I n 
fact the QR decomposition need not be recomputed until the correspondences 
change, further increasing the computational savings. 


4.3.5 Creating the Enhanced Reality Image 

Creating the enhanced reality image consists of two steps: making a virtual 
image (rendering the model) and combining the virtual image and the raw 
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image. V is used to project the model into an initially empty virtual image. 
Currently, models consist of either a collection of points or a collection of line 
segments. Points areprojected by multiplying byP and then rounding off tothe 
nearest pixel. Line segments are handled by using V to project the endpoints 
into the virtual imageand then drawing a line between them. Noanti-aliasing 
or z-buffering is performed. As a result, some edges are jagged and some 
points which perhaps should not be visible are. Once the virtual image has 
been generated it must be combined with the raw image to form the enhanced 
reality image. Pixels in the virtual image have precedence over pixels in the 
raw image. If a pixel in the virtual image is nonzero (zero being no information) 
then its value is placed in the corresponding location in the enhanced reality 
image. If a pixel in the virtual image is zero then the corresponding pixel in 
the raw image is used. Both the rendering and combination routines are a 
bit simplistic, but are sufficient to demonstrate our method. Rendering and 
combination are important problems, however they are not the focus of this 
work. 


4.3.6 Displaying the Enhanced Reality Image 

The enhanced reality image is displayed as a single image on a high resolution 
CRT using the X window system. The size of the image to be displayed and 
whether it is to be displayed on the local machine greatly affects the time it 
takes to display an image. In the current implementation, about 20% of the 
computation time is spent simply putting a half sized enhanced reality image 
on the screen. Creating a stereo display or using a video see-through HMD are 
straightforward extensions of our method. 

4.3.7 Discussion 

The current system is implemented in Lucid Common Lisp and runs on a 
SparcStation 2. Currently, the two most limiting components are the frame 
grabber and the rendering/display system. Depending on the complexity of the 
model, the Tenderer may require a minute or more to perform the rendering. 
Using a simple model, frame rates of ~2hz can be achieved. Table 4.1 shows a 
breakdown of the computational time required for the major functions. Low end 
SGI machines such as the Indy are capable of grabbing a full size color image 
and displaying it on the screen at> 30hz. TheSGI machines are also capable of 
rendering a fairly complex model and displaying it at > 30hz. Ifthesetimesare 
substituted for frame grabbi ng, renderi ng and di spl ayi ng a frame rate of ~ lOhz 
results. The frame rate would be ~20hz if the time requirements for frame 
grabbing, rendering and displaying could be eliminated entirely. Little effort 
has been put into optimizing the current implementation for speed. Recoding 
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Function 

Time Required 

Grab Frame 

0.25s 

50.3% 

Find Fiducials and Calculate 
Centroid and Local Scale Factor 

0.03s 

6.1% 

Check Correspondences 

0.007s 

1.4% 

Calculate Perspective 
Transformation 

0.01s 

2.0% 

Render Model and Create 

E nhanced Real ity 1 mage 

0.1s 

20.1% 

Display 1 mage 

0.1s 

20.1% 


Table 4.1: Time required for major functions running on a SparcStation 2 in 
Common Lisp using a simple model and displayinga half sized enhanced reality 
i mage. 


some portions and using a C-30 digital signal processing board to grab and 
process images should produce significant improvements in speed. We believe 
frame rates of >30hz should be achievable with this modest hardware. 

Assuming that we are given a model to be used in creating the enhanced 
reality image is not unreasonable. In fact, it is a fundamental assumption of 
enhanced reality visualization. Thebasicidea is to add information to an image. 
This information in most cases is not visible from the current view point and 
must come from some source other than the raw image (typically the model). 
This is not to say that constructing a model is easy. Model building is simply 
not the focus of this work. Assuming that we know the model coordinates of 
the fiducials in some cases is unreasonable. This amounts to assuming that 
thefiducials are part of the model. In Chapter 7 we present enhanced reality 
visualizations of a test object and a plastic skull. The model for the test object 
is a CAD-likemodel and thefiducials are part of it. This is not the case for the 
skull. Here, the model isaCT scan of the skull. Thefiducials are not present 
in the CT data and are not part of the model. In this case, we need a method 
of determining the model coordinates of the fiducials. Chapter 6 presents the 
detai I s of deter mi ni ng the model coordi nates for the fiducials used on the skul I. 




Chapter 5 

Feature Detection and 
Localization 


I n the last chapter we described the theory behind our method. The success 
of any method for enhanced reality visualization is inseparably tied to the 
accuracy of the data used to determine to transformation which maps model 
coordinates to image coordinates. I n this chapter we will discuss the practical 
details of finding fiducials and the accuracy with which their position and local 
scale factor can be determined. 

5.1 Details 

The circular fiducials used in our method are detected using pattern matching 
and are localized using moment calculations. An actual size fiducial is shown 
in Figure 5-1. A chord passing near the center of the fiducial will exhibit 
transitions from light to dark, dark to light, light to dark and dark to light. 
Figure 5-2 shows a blow-up of an image of a fiducial and the intensity profile 
of a chord line. Constraints such as the steepness of the transition, the length 
of the transition and the separation between transitions eliminate nearly all 
detections which do not come from actual fiducials. By checking the rows and 
columns of an image for collocated occurrences of this transition pattern the 
presence of an fiducial can be further validated and a rough position can be 
efficiently found. The time required is linear in the size of the image. In 
addition, a bounding box (.a, x 2 , yi and y 2 ) and an upper and lower threshold 
upper and t| 0W er) for each fiducial can be readily obtained from this process. 
The bounding box, with vertices [% yi], [xi y 2 \, [$2 yi] and [x 2 yil, is slightly 
larger than the smallest rectangle aligned with theaxes which can contain the 
fiducial. The upper and lower thresholds are used to rescalethe pixel values. 
These val ues are needed because we use grey scale moments to find the centroi d 
and local scalefactor of the fiducial. Thelocal scalefactor is the semi-major axis 
of the fiducial's image divided by the radius of the fiducial. The bounding box 
and thresholds for one fiducial are completely independent from those of the 
other fiducials. This produces an extremely robust detection and localization 
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Figure 5-1: Actual size Figure 5-2: Enlarged image of a fiducial with 
fiducial. pixel values for a chord shown to the side. 


algorithm. For example, large gradients in average image intensity, such as 
might be caused by shadows, have little effect on detecting and localizing the 
fiducials. 

Fiducials are detected by looking for occurrences of intensity profiles such 
as the one shown in Figure 5-2. The location and the maximum and minimum 
values of these profiles are used to determine a bounding box and an upper and 
lower threshold for the fiducial. Once a fiducial has been detected moments 
are used to cal cu I ate the centroid and local scale factor of the fiducial. Detailed 
pseudocode used to locate fiducials follows: 

1. Grab a fresh image. 

2. For each fiducial present in the last image: 

(a) Define a window centered at the last location [x' dd y' dd \ with dimen¬ 
sions equal to 2 kr. k is a window size scale factor and r is equal to 
the fiducial's local scale factor .s ; in the last image times the fiducial's 
radius in model coordinates. 1 

(b) Scan the window for fiducials. 

i. For each horizontal and vertical scan line in the region collect 
possible fiducial detects. 

A. Initialize the following parameters: h through U (location 
of transition 1-4), / min and / max (the minimum and maximum 

1 s x/y is the ratio of pixel spacing in the x and y directions U</s y ). This quantity is used to 
correct for the fact that pixels generally are not square. The x dimension of the window must 
be scaled by l/s x/y . 
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separation between transitions), wi through m 4 (width of tran¬ 
sition 1-4), m ax (the maximum width of a transition), si 
through s 4 (slope of transition 1-4), s min (the minimum in¬ 
tensity change to be considered above noise), / up per and /| 0W er 
(upper and lower thresholds). 

B. Scan across or down the scan line until the change in pixel in¬ 
tensity < —^mi n ■ Scan back several pixels, takethe minimum 
value and if it is less than t up per store it in t upper . Update h, iui 
and .si. 

C. Continue scanning until the change in pixel intensity is no 
longer < -,s m j n . Scan ahead several pixels, takethe maxi mum 
value and if it is less than t| 0wer store it in /| 0wer . Update h, wi 
and si. I f w i > iomax go to Step 2(b)i A. 

D. Continue scanning until the change in pixel intensity is > 
.smin- Scan back several pixels, takethe maximum value and 
if it is greater than t| 0wer store it in i\ ower . U pdate/ 2 , io 2 and s 2 . 

E. Continue scanning until the change in pixel intensity is no 
longer > s m in- Scan ahead several pixels, takethe minimum 
and if it is less than t upper store it in t upper . Update/ 2 , w 2 and 
52 . If /min <h-h< /max or s % 96 S2 go to Step 2(b)iA. 

F. Repeat Steps 2(b)iB and 2(b)iC except update/ 3 , w 3 and s 3 . If 

103 10 max or 2 (/ 2 / 3 ) 9 ^ (^3 W) or 53 96 5 2 ~ .s 1 go to Step 

2(b)iA. 

G. Repeat Steps 2(b)iD and 2(b)iE except update/ 4 , © 4 and s 4 . If 

104 > u’max or (/ 4 —1 3 ) 9 ^ 2(/ 2 — / 1 ) ~ (/ 3 — / 2 ) or s 4 9 ^ 53 ~ s 2 ~ Si 

goto Step 2(b)iA. 

H. Add the detection (location, size and thresholds) to a list of 
detections. 

I. Goto Step 2(b)iA 

ii. Consolidate detections. Detections from adjacent scan lines are 
combined if they overlap by at least 50%. Detections from or¬ 
thogonal scan lines are combined if the detections intersect. All 
consolidations retain the maximum bounding box, the minimum 
/upper and the maximum t| 0wer . 

iii. Expand the bounding box as necessary to ensure the fiducial is 
fully enclosed. This is required because if the fiducial is elliptical 
in shape and is at an angletothe .r and y axes, the bounding box 
generated by the detections may under estimate the size of the 
fiducial. 

(c) Calculate the 0 th , 1 st and 2 nd moments of inertia as well as the Euler 
number of the window using grey scale values. 
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(d) If the Euler number is zero return the the location and local scale 
factor [.!■' a' 1/s] for the fiducial. 

3. If there was a valid solution for the last image, predict the location and 
size of each model point for which a correspondence did not exist using 
this solution. Perform Steps 2a through 2d. 

4. If the number of correspondences maintained in Step 2 plus the number 
established in Step 3 is less than the minimum, scan the entire image for 
fiducialsand establish correspondences for any new fiducials found using 
the algorithm presented in Section 4.3.3. 

The zeroth, first and second order moments of a region bounded by .a, * 2 , yi 
and 2/2 can easily be calculated using the followingformulas: 
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(5.1) 

(5.2) 

(5.3) 

(5.4) 

(5.5) 

(5.6) 


r and y are image coordinates and p{x, y) is a weight based on the value 7>(.r, y) 
of the pixel at i mage coordinates x, y. i x , i xy and i y arethe moments of i nertia for 
an individual pixel about the center of the pixel. The weights are determined 
by the following function. 


p( x ,y) 


' 0 
< 1 

7upper~»(j,j/) 
, 7upper-7|ower 


if v(x, y ) > t upper 
if v(x, y) < tiower 

otherwise 


(5.7) 


The Euler number of the region can be used to verify that only a single object 
with a single hole is present. By using an upper and lower threshold we can 
ensure that noisy pixels which are not on the fiducial do not contribute to the 
moment calculations and that noisy pixels which are entirely on the fiducial 
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contribute fully. Euler numbers can also be used to verify that the thresholds 
have been properly chosen. From (5.1) through (5.6) the centroid of the region 
and the moment about the axis of greatest inertia can be calculated. 2 
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(5.8) 

(5.9) 


(5.10) 


It is well known that the perspective projection of a circle is an ellipse. The 
semi -major axi s of the el I i pse can be cal cuI ated usi ng the fol I owi ng equati on : 3 


a = 


4/ n 


\ m 0 (1 + r}frl) 


(5.11) 


For now we will assume that orthographic projection is a reasonable model for 
the area immediately surrounding a fiducial, see Figure 5-3. Later we will 
consider the error introduced by this assumption, Figure 5-4. This error is 
sometimes referred to as perspective distortion. Given this assumption, the 
centroid of the circle C c projects onto the centroid of the ellipse C e and the 
diameter d' projects onto the major axis of of the ellipse, a 1 . The diameter d' is 
parallel to the image plane so it is not foreshortened. Thefollowing equations 
relate the fiducial to its projection. 


a 

x 



o rp'k - rvi^ 


d! 


z* 


y = sy* = y 


(5.12) 

(5.13) 

(5.14) 


It should be noted that s, x 1 , y 1 , x* and y* are the same parameters as in (4.8) 
through (4.10). x\ y’ and s can be easily calculated requiring time linear in the 
number fiducials and their size. At first, the need to use in recovering .s 

2 The quantity shown for 7 max should actually be multiplied by an additional factor of ^ /y . 
We have omitted it for simplicity sake because it cancels with the same factor for in (5.11). 

3 Our fiducials have holes in them and the additional factor of (l + ?f/r 2 ) in the denominator 
corrects for this, r, is the inner radius and r 0 is the outer radius. 
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Figure 5-3: Orthographic projection Figure 5-4: Perspective projection of 
of a circle. a circle. 


wouI d appear to a si gnificant I i mitati on. Thi s i s not the case because s x / y is easy 
to calibrate and it does not change [Lenz and Tsai, 1988, Penna, 1991]. .s x/y is 
a function of the aspect ratio of the image sensor and the ratio of camera and 
frame grabber clock frequencies. The physical properties of the image sensor 
cannot changeand modern clocks haveextremely stablefrequencies. Therefore 
it is very reasonable to calibrate s x/y once and then forget about it. s x/y could 
also be determined via self-cali brat ion removing any burden to the user. 


5.2 Error Analysis 

There are several sources of error associated with processing digital images. 
One of the more significant sources is quantization errors [Kamgar-Parsi and 
Kamgar-Parsi, 1989]. These errors are the result of taking a continuous signal 
and converting it to digital values. First, we will examine errors caused by 
the fact that pixel values are only available at discrete locations in a lattice. 
Consider a row of pixels such as those shown in Figure5-5. The grid represents 
the pixel lattice. Pixels have just two states, on and off, with shaded squares 
representing on pixels. Using the centroid calculation described above both 
rows have the same centroid. The maximum error in the horizontal position 
of the centroid is 0.5 pixels. In the vertical direction the maximum is also 0.5 
pi xel s so the maxi mu m di stance between the cal cu I ated centroi d and the actual 
centroid is 1/^2. 

The maximum error can be reduced significantly by using a circular shape 
[Bose and Amir, 1990, Efrat and Gotsman, 1993]. Figure 5-7 shows a digital 
approxi mati on of a ci rcl e. The i mprovement which results from usi ng a ci rcul ar 
shape is caused by the fact that the error for a given row or column is dependent 
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Figure 5-5: The effect of quanti¬ 
zation errors on the centroid of a 
row of pixels. 



Figure 5-6: The effect of quanti¬ 
zation errors on the length of a 
row of pixels. 



Figure 5-7: A digital approxima- Figure 5-8: Model for greyscale 
tion of a circle. pixel values. 
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Figure 5-9: Error in the centroid of a circular fiducial. Error is the distance 
in pixels between the actual centroid and that of a digital approximation. The 
radius is also expressed in pixels. 

upon the errors of the other rows or columns. I n short, the errors tend to cancel 
out. For a circle of radius r and centroid [x 0 y 0 ] the maximum error in the 
calculated centroid /<(r) is given by the foil owing expression. 

/<(?’)= max (|| [®o ;yo] - CENTROID (so, yo,r,dr)\\) (5.15) 

i’o VO dr 

CENTROI D() is a function which calculates the centroid of a digital approxi¬ 
mation of a circle using (5.1) through (5.3), (5.8) and (5.9). p{x,y) is replaced 
with INSI DE?() which returns a 1 if (x - x 0 ) 2 + (y - y 0 ) 2 < (r + dr ) 2 otherwise 
it returns 0. Figure 5-9 shows the maximum error in the centroid calculation 
/<(?’). Thecircles arethe result of evaluating (5.15)for 0.0 < xo,yo,dr < l.Owith 
0.01 increments. The crosses arethe result of a stochastic sampling method to 
find the maxi mum over the same region. l/\/2r is also plotted on the axes. The 
curve fits the data well and is a good estimate of p(r). 

The radius calculations described above are subject to errors similar to 
those seen for the centroid. The maximum error in the length is 1 pixel, see 
Figure 5-6. Using a circular shape will also reduce the error in the calculated 
radius for the same reasons as above. For a circle of radius r and centroid 
[xoyo\ the maximum error in thecalculated radius v(r) is given by the foil owing 
expression. 

v(r) = max {\\r — RADI US (,ro, yo, r, dr)\\) (5.16) 

i’o HO dr 

RADIUS)) is a function which calculates the radius of a digital approximation 
of a circle using (5.1) through (5.6), (5.8) through (5.10) and (5.11). Again, 
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Figure 5-10: Error in the radius of a circular fiducial. Error is the difference 
in pixels between the actual radius and that of a digital approximation. The 
radius is also expressed in pixels. 

p(x,y) is replaced with INSI DE?() which returns a 1 if (x - ,r 0 ) 2 + (// - yo ) 2 < 
(r + dr) 2 otherwise it returns 0. Figure 5-10 shows the maximum error in 
the radius calculation v(r). The circles are the result of evaluating (5.16) for 
0.0 < .r 0 . yo-dr < 1.0 with 0.01 increments. The crosses are the result of a 
stochastic sampling method to find the maximum over the same region. ^2/3 ir¬ 
is also plotted on the axes. 4 The curve fits the data well and is a good estimate 

Of v(r). 

The maximum error for both the centroid and radius can be further reduced 
by using grey scale values rather than binary values [Chiorboli and Vecchi, 
1993]. Grey scale values can be modeled as the sum of a number of binary 
sub-pixels. Figure 5-8 shows a pixel with a dynamic range of 122 and a value 
of 25. This effectively increases the pixel resolution by the square root of the 
dynamic range, ^t upper - t| 0we r + 1. This increase in the resolution increases the 
effective radius of the circle. 

Another class of quantization error iscaused bythefactthat pixel values are 
theresult of a spatial process. Thevalueof a particular pixel is not the intensity 
at some infinitesimal point, rather it is the average intensity within the area 
of the pixel. Figure 5-8 shows a model of a grey scale pixel. If the shaded and 
unshaded portions of the pixel represents maximum and minimum intensity 

4 The factor of ^2/3 results because (5.11) assumes a elliptical shape and our digital ap¬ 
proximations are not truly ellipses. This error is most pronounced at small radii however some 
error will always be present. 




54 


CHAPTER 5. FEATURE DETECTION AND LOCALIZATION 



Figure 5-11: A pixel partially Figure 5-12: Effect on a circular 
covered by a larger figure. figure. 


respectively, then the pixel has a value of 0.2 or 25 on a scale from Oto 121. If 
the shaded portion is produced by a larger rectangular figure for which we are 
calculating moments, the correct values for x and y to use in (5.2) through (5.6) 
are the x and y components of the centroid of the shaded region. The centroid 
of the shaded region is not the center of the pixel, however (5.2) through (5.6) 
assume that it is, in essence treating the pixel as if it were homogeneous. The 
fact that the centroid of the region which produces a pixel's value may not be 
the center of the pixel introduces error into the moment calculations. Figure 
5-11 shows a pixel only partially covered by a larger figure, p is fraction of the 
pixel covered by the larger figure. The calculated and actual contribution tothe 
first moment, mi calculated and ???i actual , as well as their difference, Ami, are given by 
the foil owing equations: 


"^calculated = PV 

(5.17) 

"^actual = Ply 2 J 

(5.18) 

Ami = ™l calculated - mi adua| 

= p{l-p)/2 

(5.19) 


A maximum Ami of 1/8 occurs when p = 1/2. Ami cannot benegativetherefore 
the maximum error results when Ami = 1/8 along one side of the figure and 
A???i = 0 along the other. We will assume that Ami = 1/8 along the half circle 
shown in Figure 5-12. 5 The circle has a radius of r and Ami is towards the 

5 This assumption overestimates the error on two counts. First Aj?i cannot equal 1/8 
everywhere along the hemi-circle. Second Ami assumes that the partial figure is aligned with 
the pixel grid. If it is not, the maximum error is reduced. 
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Figure 5-13: Error in the centroid 
resulting from the homogeneous as¬ 
sumption. Both the error and radius 
are expressed in pixels 
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Figure 5-14: Error in the radius re¬ 
sulting from the homogeneous as¬ 
sumption. Both the error and radius 
are expressed in pixels. 


center of the circle producing the following expression for the contribution to 
the y component of the centroid: 


Ami y (.r) 



8 ?’ 


(5.20) 


I ntegrating this expression from -r to r and dividing by the area results in a 
maximum error in the centroid of Ay max = 1/16?’ as shown in Figure 5-13. 

An analysis of the error in the second moment is similar. The calculated 
and actual contribution to the second moment, ???2 calculated and ???. 2actual , as well as 
their difference, A ???. 2 , are given by the foil owing equations: 

= p{y 2 + 1 / 12 ) ( 5 . 21 ) 


m 2 actua | 



+ P 3 / 12 


(5.22) 


Ami = rn 2ralculated - m 2 actual 

= t>n (1 - /?) - -^ (l - 3/? + 2/? 2 ) 


(5.23) 
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A 



Figure 5-15: The ideal intensity pro- Figure 5-16: The effect of bleeding on 
filefor a cross section of a circular disk, the disk in the figure to the left. 


The maximum A m 2 and the valueof p at which it occurs arefunctions of y. We 
will assume that p max = 1/2 and Am 2max = y/ 4 . 6 The maximum error results 
when A???2 = y/4 around the entire circumference of the circle. Doubling Am 2 
and correcting for the orientation produces 


A 7772 (. 7 ’) 



(5.24) 


Integrating this expression from -r tor and substituting into(5.11) results in 
a maximum error in the radius of Ar max = r - \jr 2 + 8/3- as shown in Figure 
5-14. 

Next we will consider image formation errors. These errors include noise 
and nonlinearities in the image sensor [Dinstein eta/., 1984, Flealey and Kon- 
depudy, 1994], For our purposes the most significant phenomenon is the 
smoothing of high contrast edges. We will refer to this as bleeding. Figure 
5-15 shows the ideal intensity profile for a cross section of a circular disk and 
Figure 5-16 shows the effect of bleeding on the same disk. We have shown the 
transition from maximum intensity to minimum intensity as linear. This is 
almost certainly not the case, however it makes little difference for our analy¬ 
sis. As long as the transition has the same shape all along the circumference 
of the circle, bleeding has no effect on the centroid. The inertia of the two 
disks shown in Figures 5-15 and 5-16, however are not the same, therefore the 
radius calculation is effected by bleeding. In order to explore this effect we will 
consi der the el I i pse produced by the fol I owi ng functi on 


/(.'-.//) = min (max(c, 0), 1) 


(5.25) 


1 — x 2 / a 2 — y 2 /b 2 
1 — (a — w) 2 I a 2 - 


(5.26) 


a and b arethesemi-major and semi-minor axis of the el I ipse and w isthelength 
of the transition, v was chosen because it is a good approximation to a linear 
transition and is easily integrate. The ellipse is shown in Figure 5-17. The 

6 Actually p max = 1/2 - y ± \/y 2 + 1/12. This function rapidly approaches an asymptotic 
valueof 1/2. For y = 4.2 the actual Am 2 max is within 2%of 1/2. 
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Figure5-17: Intensity profilefor an ellipse. 


zeroth and second moments (about the axis of greatest inertia) for f{x,y) are 
as follows. 


mo = 


m 2 max = 


irb 


7TC 


IV 


— 2a w + 2c 


12 c 


iv — **iv a 


+ 7 ' 


2 - 2 — 6 a 3 iv + 3 a 


w a 


(5.27) 

(5.28) 


We will assume that the actual edge occurs at a - w/2 along the major axis. 
The calculated semi-major axis can be found by substituting m 0 and m 2max into 
(5.11). The ratio of the actual edge location to the calculated semi-major axis 
T](a,b,iv) is a measure of the error introduced by bleeding and is shown below. 


7 /(a, 6, w) 



3 ( w 2 — 2a w + 2a 2 ) 

4u.i 3 a + 7w 2 a 2 — 6a 3 w + 3a 4 ) 


(5.29) 


Figure 5-22 shows a plot of r/(a, 6, iv). The transition lengths we have encoun¬ 
tered are typically less than one pixel. 

So far in our discussion of errors (with the exception of bleeding) we have 
considered circles not ellipses. The analysis extends easily to cover ellipses. 
Two effects are seen as the figure becomes an ellipse. First the effective radius 
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Figure 5-18: The effect of quanti- Figure 5-19: The effect of quan- 
zation errors on the centroid for tization errors on the radius for 
a rectangular figure at an angle a rectangular figure at an angle 
to the pixel lattice. to the pixel lattice. 


isthe semi-minor axis#. Second, additional error is introduced when the major 
axis is not aligned with the pixel lattice. Figures 5-18 and 5-19 show two 
examples of the latter effect. By modifying (5.15) slightly we obtain: 

/<(?’, b/a) = max (||[a’o yo] - CENTROID (xq, yo, r, dr, b/a,0)\\) (5.30) 

i’O VO dr 8 

b/a isthe rati oof the minor axistothe major axis. CENTROI D() and IN SI DE ?() 
are modified appropriately to handle ellipses at any angle 6 relative to the x 
axi s. Figure 5-20 shows the maxi mum error i n the centroi d cal cuI ati on /<(r, b/a). 
A stochastic sampling method was used to find the maximum of (5.30) over the 

region 0.0 < x 0 , yo, dr < 1.0, 0 < # < - and r = 10. 1+ ^^ 1 ~ 6/a ^ is also plotted 
on the axes. The curve fits the data well and is a good estimate of /<(r, b/a). By 
modifying (5.16) slightly we obtain: 

v(r,b/a) = max ( ||r — RADI U S j/o, r, dr, b/a, 9) ||) (5.31) 

I'O yo dr 6 

RADIUS)) is modified appropriately to handle ellipses at any angle# relative 
to the x axis. Figure 5-21 shows the maximum error in the radius calculation 
v(r, b/a). A stochastic sampling method was used to find the maximum of (5.31) 

over the region 0.0 < xo,y 0 ,dr < 1.0, 0 < # < - and r = 10. j s 

\/36/2 

also plotted on the axes. The curve fits the data well and is a good estimate of 

v{r, b/a). 
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Figure 5-20: Centroid error for an el- Figure 5-21: Radius error for an ellip- 
liptical fiducial with a radius of 10. tical fiducial with a radius of 10. The 
The error is expressed in pixels and error is expressed in pixels and b/a is 
b/a is the ratio of minor and major the ratio of minor and major axis, 
axis. 

Finally we will consider the errors introduced by our assumption of ortho¬ 
graphic projection for a fiducial. As shown in Figure 5-4, some error is intro¬ 
duced in the centroid as well as the semi-major axis. Consider a plane rotated 
by an angle of <f> about an axis passing through [ Vo Y 0 Z 0 ] in camera centered 
coordinates which is parallel to the x axis. Let [X Y] represent a point on the 
plane and let the origin of the plane be [X 0 ! 0 Z 0 ]. Points on the plane project 
on to the image plane by the foil owing relationships 


X 


/ 


/ 

y 


f(X + X o) 

V sin <j) + Z 0 

(5.32) 

f 0 cos <> + V 0 ) 

Ys\r\<p + Z 0 

(5.33) 


Next, consider a circle in the plane and centered at the origin with a radius of 
r. We can determine the effects of perspective distortion on the centroid and 
radius calculation by evaluating foil owing continuous versions of (5.1) through 
(5.6). 
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mo = 


???x = 


m v = 


m x i = 


m y 2 = 


r\/r 2 -y 2 

\ Sx'Sy' 

-s/XXp- 

(5.34) 

1' V r 2 — y 2 

\ x'Sx'Sy' 

-s/XX?- 

(5.35) 

r V T 2 —y 2 

1 y' Sx'Sy' 

- v r 2 -y 2 

(5.36) 

r v r 2 —y 2 _ 

1 v /2 8x / 8v / 

~ V r 2 -y 2 

(5.37) 

ry r 2 -y 2 

1 X 'y'8x'8y' 

“V r 2 -y 2 

(5.38) 

l' V r 2 — y 2 

1 x' 2 8x'8y' 

-s/xx?- 

(5.39) 


The error in the x component of the centroid a is the difference between the 
projection of A 0 and ??%/???. 0 . The error in they component : /3 is defined similarly 


Q = 


fZ 0 Xo 


Zl - r 2 sin 2 t 


fX o 

Z 0 


5.40) 


/ (z^r 2 si n! V + Y o r ' 2 cos </>si n 2 </> - 2 0 r 2 si n <f> - V 0 2 Z 0 si n <f> + z £) 0 cos </>) / y 0 

(/q - r 2 sin 2 </>) ( 2 b cos •# - 5 o sin $ 2 0 

The equation quantifying the effect of our orthographic projection assumption 
on the semi-major axis is as follows. 


aZ o 



(5.41) 


The full version is much too messy to include here, a is the semi-major axis 
of the projection and can be solved for using (5.34) through (5.39), (5.10) and 
(5.11). Figures 5-23 through 5-27 show plots of a, l and 7 for typical values: 
R 0 = 10cm, Z 0 = 100cm, r = 0.5cm and / = 2000 pix els. A 0 and V 0 are 
converted to polar coordinates such that Rq = \Jx^ + 1 0 2 . R 0 is fixed and 6 is 
one of the axes plotted. Although we have not found it necessary, estimates of 
the transition length and the minor axislength can be easily obtained from the 
imageand used to improve the calculated centroid and semi-major axis. 
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Figure 5-22: The effect of bleeding Figure 5-23: Perspective error in the 
on the radius calculation. Error is centroid parallel to the major axis, 
expressed as the ratio of the actual The error is expressed in pixels, 
radius and the calculated semi-major 
axis, d'/a'. The transition length and 
radius are expressed in pixels. 



Figure 5-24: Perspective error in the Figure5-25: Total perspective error in 
centroid perpendicular to the major thecentroid. The error is expressed in 
axis. The error is expressed in pixels, pixels. 
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Figure 5-26: Perspective error in the Figure 5-27: Perspective error in the 
semi-major axis for R 0 = 10cm. Error semi-major axis for R 0 = 10cm. Error 
is expressed as the ratio of the calcu- is expressed as the ratio of the calcu¬ 
lated value and the actual radius. lated value and the actual radius. 

5.3 Experiments 

I n the absence of noise, two images of a fiducial taken from the same position 
should produce the same centroid and semi-major axis. If our calculations 
wereexact, a series of images taken from positions displaced only in a direction 
perpendi cu I ar to the opti c axi s shou I d produce centroi ds that vary I i nearly with 
position. Similarly, a series of images taken from positions displaced only in 
a direction parallel to the optic axis should produce semi-major axes that vary 
linearly with the inverse of position. The degree to which real data deviate 
from these ideals is an empirical measure of the accuracy of our calculations. 

Two sets of experiments were conducted using an optical bench. The first 
set of experiments examined the centroid calculations, the second set the semi¬ 
major axis calculations. A rail with a precision positioner runs along one side 
of the optical bench. For the first set of experiments, a camera was mounted 
on the positioner with its optic axis perpendicular to the rail. This setup 
allows camera motion only in the x direction. Motion in the y direction is 
achieved by rotating the camera 90° in its mounting. A Klinger DCS-750 
motor controller with UE-72CC positioner was used to precisely position the 
camera. The controller/positioner combination is accurate to a few microns. 
On the optical bench roughly a meter away a fiducial was mounted so that it 
appeared near the center of the camera's field of view, see Figures 5-28 and 
5-29. Experiments were performed for the foil owing cases: 
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Optical Bench Optical Bench 


Figure 5-28: Top view of experimental Figure 5-29: Side view of experi men - 
setup for centroid. tal setup for centroid. 

• 10 micron and 100 micron steps 

• Motion in both the.r and y directions 

• Fiducial parallel to the image planeand at a 45° angle 

• Light and dark images 

For each experiment, data was collected at 26 camera positions along the rail. 
At each position, 100 images were collected. The mean and standard devia¬ 
tion were calculated for each position. The results are shown in Figures 5-30 
through 5-41. The mean at each position is marked by an "x". The error bars 
are 1 standard deviation above and below the mean. The line is the least 
squares best fit to the means. Table 5.1 shows both the theoretical and empir¬ 
ical accuracy of the centroid calculation for three conditions. The theoretical 
and empirical results are in good agreements. 


Condition 

Dynamic 

Range 

Major 

Axis 

Empirical 

Accuracy 

Theoretical 

Accuracy 

Bright, flat 

80 

17.5 

0.065 

0.087 

Dark, flat 

6 

17.5 

0.15 

0.16 

45° Angle 

40 

17.5 

0.22 

0.16 


Table 5.1: Empirical and theoretical accuracy for centroid calculations. 
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Figure 5-30: Data for bright image, 
camera motion in the.r direction with 
100 micron steps and fiducial parallel 
to the image plane. 



Figure 5-31: Data for bright image, 
camera motion in the .r direction with 
10 micron steps and fiducial parallel 
to the image plane. 



Figure 5-32: Data for bright image, 
camera motion in they direction with 
100 micron steps and fiducial parallel 
to the image plane. 



Figure 5-33: Data for bright image, 
camera motion in they direction with 
10 micron steps and fiducial parallel 
to the image plane. 
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Figure 5-34: Data for dark image, 
camera motion in the .r direction with 
100 micron steps and fiducial parallel 
to the image plane. 



Figure 5-35: Data for dark image, 
camera motion in the .r direction with 
10 micron steps and fiducial parallel 
to the image plane. 



Figure 5-36: Data for dark image, 
camera motion in they direction with 
100 micron steps and fiducial parallel 
to the image plane. 



Figure 5-37: Data for dark image, 
camera motion in they direction with 
10 micron steps and fiducial parallel 
to the image plane. 
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Figure 5-38: Data for bright image, 
camera motion in the .r direction with 
100 micron steps and fiducial 45° to 
the image plane. 



Figure 5-40: Data for bright image, 
camera motion in they direction with 
100 micron steps and fiducial 45° to 
the image plane. 



Figure 5-39: Data for bright image, 
camera motion in the .r direction with 
10 micron steps and fiducial 45° to the 
image plane. 



Figure 5-41: Data for bright image, 
camera motion in they direction with 
10 micron steps and fiducial 45° to the 
image plane. 
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Figure 5-42: Top view of experimental Figure5-43: Side view of experi men - 
setup for semi-major axis. tal setup for semi-major axis. 

A similar set of experiments were conducted for the semi-major axis. For 
this set of experiments, the camera was mounted with its optic axis parallel to 
the rail. This setup allows camera motion only in the ~ direction. As before 
a fiducial was mounted on the optical bench roughly a meter away so that it 
appeared near the center of the camera's field of view, see Figures 5-42 and 
5-43. Experiments were performed for the foil owing cases: 

• Fiducial parallel to the image planeand at a 45° angle 

• Light and dark images 

For each experiment, data was collected at 26 camera positions along the rail. 
At each position, 100 images were collected. The mean and standard deviation 
were calculated for the major axis at each position. The least squares best fit to 
the means was also found. The results are shown in Figures 5-44 through 5-46. 
Table 5.2 shows both the theoretical and empirical accuracy of the semi-major 
axis calculation for three conditions. The theoretical and empirical results are 
in good agreements. 


Condition 

Dynamic 

Range 

Major 

Axis 

Empirical 

Accuracy 

Theoretical 

Accuracy 

Bright, flat 

40 

17.5 

0.15 

0.16 

Dark, flat 

4 

17.5 

0.23 

0.24 

45° Angle 

50 

16.8 

0.24 

0.23 


Table 5.2: Empirical and theoretical accuracy for semi-major axis calculations. 
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Figure 5-44: Data for bright image, 
camera motion in the ~ direction with 
2000 mi cron steps and fiducial parallel 
to the image plane. 



Figure 5-45: Data for dark image, 
camera motion in the ~ direction with 
2000 micron steps and fiducial paral¬ 
lel to the image plane. 



Figure 5-46: Data for bright image, 
camera motion in the ~ direction with 
2000 micron steps and fiducial 45° to 
the image plane. 
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5.4 Discussion 

An accurate, efficient and robust method of locating fiducials has been de¬ 
scribed. A thorough error analysis of the method has been provided including 
both theoretical and empirical data. This data confirms that in the worst case 
fiducials can be located to within 0.25 pixels and the local scale factor can be 
determined to within 1.5%. 
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Chapter 6 

Initial Calibration 


Before our fiducials can be used to perform enhanced reality visualizations, 
their model coordinates must be known. In cases were the model coordinates of 
the fiducials are not known a priori, an initial calibration must be performed. 
An initial alignment is performed using data from a laser scanner [Grimson et 
at., 1994], This initial alignment along with an image showing the fiducials is 
then used to lookup the model (MR or CT) coordinates of the fiducials. Figure 
6-1 shows an overview of this process. Once the coordinates of the fiducials 
have been established the laser scanner is no longer needed and any camera 
which can view the fiducials can be used for enhanced reality visualization. In 
this chapter we provide a general discussion of how the laser scanner works 
and then give the details of fiducial calibration. 



Laser Scanner Alignment 



<S) 

Model of Patient's 
Brain and Fiducials 


Figure 6-1: Determining the model (MR) coordinates of the fiducials. 
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TJ 

y )f -'' ’ 



Figure 6-2: Side view of scanner. Figure 6-3: Object and laser light 

plane from video camera perspec¬ 
tive. 


6.1 Laser Scanner 

For the skull data shown in Chapter 7, the initial calibration was performed 
with the aid of a laser range scanner produced by Technical Arts Corporation. 
It produces a planar sheet of light which is scanned using an oscillating mirror. 
A video camera is placed at an angle to the plane of light, see Figure 6-2. 
The x axis is parallel to the line segment joining the laser and video camera. 
The laser is oriented so that the line of illumination formed when the laser's 
plane of light strikes an object is perpendicular to the x axis. The y axis is 
parallel to the line of illumination, see Figure 6-3. The ~ axis is orthogonal 
to both the x and y axes. The y axis in Figure 6-2 is into the page and the x 
axis in Figure 6-3 is out of the page. The three dimensional coordinates of an 
object's surface can be determined if it is placed so that it can be simultaneously 
illuminated by the laser and viewed by the video camera. They coordinateof a 
data point is determined directly from the horizontal displacement measured 
by the video camera. The .r and ~ coordinates are recovered using the vertical 
displacement and the scan angle of the laser beam. A single scan produces 240 
three dimensional measurements with accuracies up to 0.003". 

Surface points on the patient's head near the surgical site are measured 
with the scanner. These data points are aligned with a model of the patient's 
head and brain obtained from previous MR and/or CT scans [Grimson eta/., 
1994]. The registration is produced by minimizing the sum of the squared 
distance between the laser data and the model. The model is sampled at several 
resolutions to speed convergence and random perturbations are used to avoid 








Figure 6-4: Model of patient's brain Figure6-5: Patient, scanner and coor- 
with coordinate axes. dinateaxes. 



Figure 6-6: Model of patient's brain Figure 6-7: Model of a skull. 

aligned with patient (validonlyforthe 

scanner). 
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Figure 6-9: Video of skull being scanned. 
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Figure 6-10: Laser data aligned Figure 6-11: Model aligned with 
with skull. videoof the same skull. 

local minima. This produces a transformation from the coordinate frame of the 
model to that of the patient. Figure 6-4 shows a brain model and its coordinate 
system. Figure 6-5 shows the patient, scanner and their coordinate system. 
Figure 6-6 shows the result of aligning the model and the patient coordinate 
systems. It should be noted that Figure 6-6 is valid only from the perspective 
of the camera attached to the laser scanner. 

Figure 6-7 shows a model obtained from CT imagingofa plasticskull. Figure 
6-8 shows the three dimensional data obtained from laser scanning the same 
skull. Figure 6-9 shows two images of the skull containing laser scan lines. 
Figure 6-10 shows the laser data aligned with the skull. Figure 6-11 shows the 
model of the skull aligned with and superimposed upon videoof the skull. The 
accuracy of this registration is believed to be on the order of the resolution of 
the M R or CT data lmm). 

6.2 Calibration Routine 

The transformation used to produce Figure 6-11 essentially maps model coor¬ 
dinates to image coordinates. This is exactly the inverse of what we need to 
calibrate our fiducial s. What we would I ike to be able to do is measure the im¬ 
age coordi nates of the fiducials and then use an i n verse mappi ng to deter mi ne 
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their model coordinates. The inverse mapping is not in general a one to one 
mapping. The image coordinates of a fiducial determine the ray in three space 
along which the model coordinates of the fiducial must lie. By intersecting 
this ray with the model and Z-buffering the result we can determine the model 
coordinate of the fiducial. 

The initial calibration is limited by the accuracy of the registration per¬ 
formed using the laser scanner. The laser scanner registration produce results 
which are both repeatable and good for the single view in question. It is not 
clear how accurate the registration is in an absolute sense. Absolute accuracy 
is important for calibrating thefiducials. For example, small errors which are 
imperceptible from one point of view frequently lead to large errors from dif¬ 
ferent view points. Perturbing the laser scanner solution by less than 1° can 
change a fiducial's location by over 6mm. The uncertain accuracy of the fiducial 
calibration bears significantly on the quality of enhanced reality visualizations 
which can be produced. In spite of this issue, the results shown in Chapter 7 
are promising. The exact source and nature of the errors in the initial calibra¬ 
tion needs to be explored further. The method of initial calibration presented 
here, while it requires the use of a laser scanner, has the advantage that it can 
be made fully automatic. Of course, other methods of initial calibration could 
be used. The only requirement is that the fiducial locations be determined 
accurately. How this information is obtained does not matter. 



Chapter 7 
Results 


7.1 Test Object 

To determine the accuracy of our method we performed several experiments 
using a special test object. A three dimensional object with seven fiducials 
was made. The relative positions of the fiducials were accurately measured. 
Figure 7-1 shows the basic test object. The large pillar near the center is used 
to measure the accuracy of the enhanced reality visualization. A wireframe 
corresponding to the edges of the pillar is displayed in the enhanced reality 
image. The difference between the actual edges and wireframe is a measure 
of the accuracy of the visualization. For comparison purposes a wireframe is 
also superimposed on a shorter pillar. Experiments were performed with four 
slightly different fiducial configurations. The first configuration consists of 6 
coplanar fiducials plus a single fiducial 1cm above the plane. The enhanced 
reality visualizations produced from this configuration are not consistently ac¬ 
curate. Figures 7-2 through 7-5 are typical of the results for this configuration. 
Notice that the wire frame on the smaller pillar matches in all of the images. 
Errors occur only significantly away from the volume enclosed by thefiducials. 
The second configuration moves one of the 6 coplanar fiducials so that it isalso 
lcm above the plane. Figures 7-6 through 7-8 show typical results for this 
configuration. The accuracy is greatly improved with the addition of a second 
noncoplanar fiducial, however occasional slight errors do occur. The next config¬ 
uration adds a third noncoplanar fiducial. Typical results for this configuration 
are shown in Figures 7-9 and 7-10. With this configuration, errors rarely occur 
and when they do they aresmall. Thefinal configuration is similar tothefirst 
except that the noncoplanar fiducial is 4cm out of the plane. Results for this 
configuration are shown in Figures 7-11 and 7-12. The accuracy of this config¬ 
uration is comparable to that of the last. Some depth in the model is required 
to accurately recover the third dimension. Once sufficient depth is present in 
the model, it appears that the solution is accurate over a large volume. 

As shown in Figures 7-2 through 7-12, under reasonable conditions our 
method produces good results. These figures provide only a subjective measure 
of accuracy. Next we will attempt to provide a more quantitative measure. 
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Figure 7-7: Test object view 6. 


Figure 7-8: Test object view 7. 
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Figure 7-11: Test object view Figure 7-12: Test object view 
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Average 

Distance 

RMS 

Distance 

Maximum 

Distance 

Median 

Distance 

Number of 
Points 

1.35 

1.46 

2.94 

1.38 

848 


Table 7.1: Distance between edge-based and enhanced reality vertices. 


Average 

Misalignment 

RMS 

Misalignment 

Maximum 

Misalignment 

Median 

Misalignment 

Number of 
Points 

0.88 

0.91 

1.77 

0.87 

212 


Table7.2: Misalignment between edge-based and enhanced reality polygons. 


Given that the goal of enhanced reality visualization is to produce an enhanced 
image, the proper way to assess accuracy is to consider the deviation of the 
enhanced image from the ideal image. For various reasons we do not have 
access to the ideal image. However, we can compare the deviation between 
the image of a physical object and the enhancement produced from a model of 
the same physical object. For example, we could compare the wire frame to 
the edges of the test object's central pillar. This is essentially what we will do. 
Figure 7-13 shows the same test object used above with one change, the top of 
thecentral pillar had been darkened to provide high contrast edges. Edge pixels 
are extracted and chained together using a Canny edge detector. A line is fitted 
to the chains by minimizing the distance between the edge pixels and the line. 
These lines areused to compute two measures of accuracy. Thefirst measure is 
the distance between the vertices of the darkened region as determined by our 
method (enhanced reality visualization) and those determined by intersecting 
the lines recovered above. The second measure is the area of misalignment 
divided by the perimeter or the average distance between the two polygons, see 
Figure 7-14. 

A sequence of 212 images of the test object rotating through 36(1 was used. 
The first measure was computed for each of the four vertices in each image. 
The distance between the vertices in each image are shown in Figures 7-15 
through 7-18. The second measure was computed for the darkened polygon 
in each image and the average distance between the polygons for each image 
is shown in Figure 7-19. Tables 7.1 and 7.2 summarize these results. The 
edge-based positions agree well with those produced by our method. It should 
be noted that edges and vertices recovered using the Canny edge detector are 
not without error. The most significant source of error is the fact that the 
implementation used only localizes the edge to the nearest pixel. As a result of 
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Figure7-13: Test object used to Figure 7-14: Misalignment of 
quantify accuracy. edge-based rectangle and en¬ 

hanced reality rectangle. 




Figure 7-15: Distance in pixels be- Figure 7-16: Distance in pixels be¬ 
tween edge-based and enhanced real- tween edge-based and enhanced real¬ 
ity positions for vertex 1. ity positions for vertex 2. 
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image 

Figure 7-17: Distance in pixels be¬ 
tween edge-based and enhanced real¬ 
ity positions for vertex 3. 


3 - 



0 0 50 [So l 3 (j 2M 

image 


Figure 7-18: Distance in pixels be¬ 
tween edge-based and enhanced real¬ 
ity positions for vertex 4. 
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image 

Figure 7-19: Average distance be¬ 
tween edge-based and enhanced real¬ 
ity polygons. 



84 


CHAPTER 7. RESULTS 


this, the first measure probably over esti mates the difference between the edge 
image and the enhanced image. A signed measure of the distance from point to 
line 1 was also computed to check for correlation in the errors. The average for 
the first measure was 0.09 pixels and for the second measure was 0.04 pixels 
making it unlikely that the errors in edge-based positions are correlated with 
those in the enhanced reality positions. The average difference between the 
two perimeters is likely the better measureof accuracy and using this measure 
our method is accurate to within a pixel. 


7.2 Skull 

After calibrating the fiducials as described in Chapter 6, enhanced reality vi¬ 
sualizations were performed from several different view points. Figure 7-20 
shows the initial view of a plastic skull. Figure 7-21 shows the results of the 
registration using the laser scanner. The white dots are CT data points for the 
skull superimposed upon an image of the skull. Figure 7-22 shows the results 
of our method using the initial view point. Note that as expected the error in 
the Figures 7-21 and 7-22 are comparable. Figures 7-23 through 7-32 show the 
results of our method using ten novel viewpoints. The exact source of the mis¬ 
alignment present in some of the figures is not known. Two likely sources are 
the initial calibration and the fact that the fiducials are nearly coplanar. The 
errors are largest for view points significantly different from that used during 
the initial calibration which suggests that at least some of the misalignment 
is caused by errors introduced during the initial calibration. Also, as noted in 
Chapter 6 small perturbations in the laser scanner alignment cause relatively 
large variations in thecalibrated positions of thefiducials. A morerobust initial 
calibration and a method of handling coplanar fiducials, which together should 
eliminate these errors, are currently being investigated. 


^hesign indicates which side of the line the point is on. 
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Figure 7-29: Skull view 7. 


Figure 7-30: Skull view 8. 
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Figure 7-31: Skull view 9. 


Figure7-32: Skull view 10. 


Chapter 8 
Conclusions 


8.1 Future Work 

There are many improvements and extensions which can be made to the basic 
method presented in this report. Several of them have been alluded to in 
previous chapters. 

8.1.1 Auto calibration 

The current implementation requires knowledge of the rati oof pixel spacing in 
the .r and y directions, ,s x/y in order to recover m While s x/y is extremely stable 
and determining its value is straight forward, it would be nice to eliminate 
this requirement. Given enough noncoplanar fiducials it should be possible to 
solve for .s x/y . Solvingfor s x/y might be fairlyti me consuming but it should only 
need to be performed once. Similarly, it would be nice to beableto handle lens 
distortion (although we have not found it necessary). It is a simple matter to 
add a lens distortion model as a preprocessor. The fiducial locations are simply 
corrected by the amount specified by the model. In some cases considering dis¬ 
tortion would surely improve the results. Unfortunately this requires finding 
the proper distortion model. There are several well established methods for 
determining lens distortion. Ideally, the model would be determined automat¬ 
ically. Lens distortion changes with several of the other camera parameters 
such as aperture, focus and zoom. Even so, given several data sets consisting 
of image points, model points, perspective transformation, aperture, focus and 
zoom settings we should be able to construct a model which takes into account 
the effect of changes to aperture, focus and zoom. It should be necessary to 
construct a new model or modify an old model only occasionally. Our method 
as it is currently implemented models a linear approximation to lens distor¬ 
tion implicitly. Automatically determining a valuefor % /y and a model for lens 
distortion are two examples of auto calibration. Auto calibration would bridge 
the gap between what is implicitly modelableand other required or desired 
parameters. The parameters which are implicitly modeled are those which 
can change from one image to the next. The parameters which are candidates 
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for auto calibration are either fixed or vary on much longer time scales. This 
makes it possibletorun an auto calibration routine in the background updating 
parameters as necessary but certainly not every frame. Auto calibration would 
enable a completely uncalibrated camera with a poor quality lens to be used to 
produce very accurate enhanced real ity vi sual izati ons. 

8.1.2 General Features and Self Extending Models 

Where auto calibration enables camera parameters to be recovered, self ex¬ 
tending models allow recovery of model parameters. Given a partial model and 
several solutions from different view points it should be possibleto recover the 
three dimensional location of points not in the model but which are visible. 
This effectively extends the model. The ability to use general features in place 
of fiducials would be a significant improvement for many potential applications 
and makes self extending models truly useful. The circular fiducial currently 
used facilitates recovery of depth information. The same kind of size informa¬ 
tion can potentially be recovered from naturally occuring spatial features such 
as patches of texture, etc. Another possible option is to use stereo to recover 
depth information. Since relative depth is all that is required the stereo cam¬ 
eras need not be calibrated. Using a stereo setup would eliminate the need to 
know .s x /y and would facilitate a stereo display. 

8.1.3 Miscellaneous 

Several other more modest improvements or extensions also exist. As noted in 
Chapters 4 and 7, noncoplanar points significantly improve the results of our 
method. With a slight modification to our method it should be possibleto use 
planar data exclusively. The modification involves solving a quartic equation. 
The ability to function when all of thefiducials arecoplanar would certainly be 
an asset. It is not clear what accuracy can be achieved by this approach. As 
noted in Chapter 6 there is room for improvement in the initial calibration of 
fiducials. At a minimum a better understanding of the source and nature of 
errors is needed. A more robust method of calibrating fiducials would also be 
very useful. Finally, the current implementation can be optimized significantly 
for speed. 


8.2 Applications 

Neurosurgery is just one application for enhanced reality visualization. There 
are numerous other potential applications both inside and outside the medical 
field. These applications range from manufacturing and repair to navigation 
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and rescue. Enhanced reality visualization can be most readily applied in 
domains where models either already exist or are easily obtainable. In the 
medical field models are easily obtained using internal anatomy scanners such 
as MR and CT. Models already exist for many repair and manufacturing en¬ 
vironments. Before enhanced reality visualization will be accepted for routine 
use an accurate and robust method of performing the visualization must be 
developed. Our method makes significant progress towards this goal. Better 
methods of generating models will make it practical to apply enhanced reality 
visualization to many more situations. Because it is anchored in the real world 
it has the potential to affect our everyday lives. For example, enhanced reality 
visualization can be used to transcend the I imitations of computer monitors. By 
moving the display off the desk and into a visor or pair of goggles the display 
becomes much more useful. Multiple virtual screens can be defined. These 
virtual screens are anchored in the real world so the user, rather than fumbling 
with a mouse to expose the desired window, can simply look around and exam¬ 
ine the contents of the various virtual screens. In effect the size of the display 
becomes unlimited. Since the virtual screens are anchored in the real world 
they are spatially organized which is a powerful organizational metaphor. For 
example, you can define a virtual screen containing a phone list and attach it 
to your bulletin board. Whenever you needtochecka phonenumber onthelist 
all that need be done is look at the bulletin board. The phone list will always 
be where you posted it. 


8.3 Discussion 

A new method for performing enhanced reality visualization has been devel¬ 
oped. The method achieves good results using just a few fiducials placed near 
the volume of interest. Noncoplanar fiducials yield better results. Our method 
allows for motion and automatically corrects for changes to the internal cam¬ 
era parameters (focal length, focus, aperture, etc.) making it particularly well 
suited to enhanced reality visualization in dynamicenvironments. In asurgical 
application, we pi ace a few fiducials placed near the surgical site immediately 
prior to surgery. An initial calibration is performed using a laser scanner. Af¬ 
ter the calibration is performed, our method is fully automatic, runs in nearly 
real-time and is accurate to a fraction of a pixel and requires only a single 
i mage. 
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Appendix A 

Effects of Radial Distortion 


As discussed in previous chapters, our method only models a linear approxima¬ 
tion to radial distortion. In this appendix we will consider the consequences of 
this approximations. Radial distortion is typically modeled as follows [SIama, 
1980]: 


,r' u 

= x' d + Sx 


(A.l) 

y'u 

= </d + h 


(A. 2) 

Sx 

= (*'d - x o) 

(ATr'd + A 2 r'd) 

(A. 3) 

6y 

= ( y'd - yo) 

(ATr'd + /v 2 /-'d) 

(A. 4) 


Where A u and y' u are the undistorted pixel coordinates, A d and y d are the dis¬ 
torted pixel coordinates with r' d = (r' d - x 0 ) 2 + (y' d - y 0 ) 2 , and AT and AT are 
the radial distortion coefficients. If AT and A 2 are known, it is quite simple 
to use this radial distortion model to correct the data before passing it to our 
method. However, what if AT and AT are not known? The linear approximation 
contained in our method works well if radial distortion is not too great. It also 
works well if the fiducial s are near the location of the enhancement. 

As a preliminary step, we measured the radial distortion for our camera 
setup. Radial distortion was measured using the plumb-line method [Brown, 
1971, Stein, 1993] for a 16mm and 25mm lens. The results are summarized in 
TableA.l. Plots of the distortion arealso shown in Figures A-l and A-2. 

I n order to gain some insight intothe effect of radial distortion on our method 
we will attempt to quantify the difference between the radial distortion model 


Lens 

•'0 

yo 

AT 

AT 

16mm 

336 

246 

2.0e-8 

2.0e-13 

25mm 

332 

248 

-4.0e-8 

1.0e-13 


TableA.l: Radial distortion parameters for a 16mm and 25mm lens. 
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described above and the I inear approximation included in our method. To quan¬ 
tify the difference, a set of 5 three dimensional control points K were selected 
and a perspective transformation V (composed of a rigid transformation and a 
camera calibration matrix as described in Chapter 4) was defined. The control 
points were then projected onto the image plane using (4.2) producing a set of 
undistorted image points I u . The undistorted image points werethen distorted 
by 6d = [<5,r 6y ] based on theradial distortion measured aboveproducing a set of 
distorted image points / d . Now we have a set of control (model) points and a set 
of distorted image points. This is exactly the information that our method uses 
tocomputea perspectivetransformation. Wecomputea new perspective trans¬ 
formation V using (4.8) through (4.10). The effect of the linear approximation 
on an arbitrary three dimensional point M can be expressed as follows: 


v = \\(MV + 6d) — MV'\\ 2 (A.5) 

Figures A-3 and A-4 show two plots of y for a plane parallel to the image 
plane and passing through the control points. The five spikes mark the image 
locations of the five control points. N otice that globally y is no worse than the 
raw radial distortion. Further, y is small in the vicinity of the control points 
even when the control points are located at the edge of the image. These two 
characteristics were true for all of the simulations that we ran. This confirms 
the claim that our method works well if the radial distortion is not too great or 
the enhancement is close to the fiducials. 

Finally, we performed an experiment using a lens with significant distortion. 
Figure A-5 shows an image taken using a 4.8mm lens. The test object is the 
same one which appeared in earlier figures. Thedistortion is readily apparent. 
Figures A-6 through A-8 show the results of our method without correcting for 
distortion. 

I n general, a more accurate model produces more accurate results. Correct¬ 
ing for radial distortion undoubtedly would improve the results of our method. 
Flowever, as shown in this appendixfor enhanced reality visualization using 
reasonable cameras in realistic viewing situations the improvement is almost 
insignificant. 
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FigureA-1: Radial distortion in pixels 
for a 16mm lens. 



FigureA-3: Effect of radial distortion 
on our method with thefiducials near 
the center of the image. 



FigureA-2: Radial distortion in pixels 
for a 25mm lens. 



Figure A-4: Effect of radial distortion 
on our method with thefiducials near 
the edge of the i mage. 
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Figure A-5: I mage with signifi- Figure A-6: Distorted image 
cant distortion. view 1. 



Figure A-7: Distorted image Figure A-8: Distorted image 
view 2. view 3. 
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